ANSI/ISO C++ Professional Programmer's Handbook

Contents


10

STL and Generic Programming

by Danny Kalev

Introduction

Object-oriented design offers a limited form of code reuse inheritance and polymorphism. The generic programming paradigm is designed to enable a higher level of reusability. Instead of data hiding, it relies on data independence. C++ has two features that support data independence: templates and operator overloading. A combination of these features allows a generic algorithm to assume very little about the actual object to which it is applied, whether it is a fundamental type or a user-defined type. Consequently, such an algorithm is not confined to a specific data type, and it has a higher reusability potential than does a type-dependent algorithm.

The Standard Template Library (STL) is an exemplary framework that is built on the foundations of generic programming. STL is a collection of generic algorithms and containers that communicate through iterators. This chapter explores the principles of generic programming, focusing on STL. A complete account of every STL container and algorithm can fill a book of its own, so this chapter only discusses the basic concepts of generic programming. It starts with an overview of STL header files. STL components are discussed next: containers, iterators, algorithms, function objects, adaptors, and allocators. This discussion presents some of the most widely used containers and algorithms of STL. Finally, class string is described in detail.

Generic Programming

Generic software is primarily reusable software. Reusability is characterized by two key features: adaptability and efficiency. It is not difficult to imagine highly adaptive software components that are too inefficient to become widely used (these are usually implemented by complex inheritance hierarchies, virtual functions, and extensive use of runtime type information). Conversely, efficient components are generally written in low-level, platform-dependent code that is both nonportable and hard to maintain. Templates overcome these difficulties because they are checked at compile time rather than at runtime, because they do not require any inheritance relation among objects, and because they are applicable to fundamental types. The most useful generic components are containers and algorithms. For years, programmers were implementing their own lists, queues, sets, and other container types to make up for the lack of language support; however, homemade containers suffer from significant drawbacks. They are not portable, they are sometimes less than 100% bug free, their interfaces vary from one implementation to another, and they can be less than optimal in terms of runtime performance and memory usage.

In the latest phases of the standardization of C++, Alex Stepanov suggested adding a generic library of containers and algorithms to C++. He based his proposal on a similar generic library that he had previously designed for Ada. At that time (November 1993), the committee was under pressure to complete the ongoing standardization process as fast as possible. Consequently, suggestions for language extensions were rejected one after another. However, Stepanov's proposal was too good to be forsaken the committee adopted it unanimously.

The proposed generic library was a collection of containers based on mathematical data models such as vector, queue, list, and stack. It also contained a set of generic algorithms such as sort, merge, find, replace, and so on. These library constituents were implemented with templates. Still, templates alone are insufficient because fundamental types, pointers, user-defined types, and single bits are manipulated by different language constructs and operators. Operator overloading provides the necessary uniform interface that abstracts the actual data type of a container or an algorithm. The following section examines these components in greater detail.

Organization of STL Header Files

STL components are grouped under namespace std. They are defined in the following header files. Note that prestandardized implementations of STL might use different header names, as indicated below.

Containers

Container classes are defined in the following header files (see Table 10.1). The associative containers multimap and multiset are defined in <map> and <set>, respectively. Similarly, priority_queue and deque are defined in <queue>. (On some prestandardized implementations, the container adaptors stack, queue, and priority_queue are in <stack.h>).

Table 10.1 STL Containers

Header

Contents

<vector>

An array of T

<list>

A doubly-linked list of T

<deque>

A double-ended queue of T

<queue>

A queue of T

<stack>

A stack of T

<map>

An associative array of T

<set>

A set of T

<bitset>

A set of Boolean values

Algorithms

STL generic algorithms can be applied to a sequence of elements. They are defined in the following header file (see Table 10.2). (On prestandardized implementations, generic algorithms are defined in <algo.h>).

Table 10.2 STL Algorithms

Header

Contents

<algorithm>

A collection of generic algorithms

Iterators

Iterators are used to navigate sequences. They are defined in the following header file (see Table 10.3)

Table 10.3 STL Iterators

Header

Contents

<iterator>

Various types of iterators and iterator support

Numeric Library

STL provides several classes and algorithms that are specifically designed for numeric computations (see Table 10.4).

Table 10.4 Numeric Containers and Algorithms

Header

Contents

<complex>

Complex numbers and their associated operations

<valarray>

Mathematical vectors and their associated operations

<numerics>

Generalized numeric operations

Utilities

The following headers define auxiliary components that are used in STL containers and algorithms (see Table 10.5). These include function adaptors, pairs, and class auto_ptr (discussed later).

Table 10.5 General Utilities

Header

Contents

<utility>

Operators and pairs

<functional>

Function objects

<memory>

Allocators and auto_ptr

Containers

A container is an object that can hold other objects as its elements. A generic container is not confined to a specific type it can store objects of any kind. C supports one container type in the form of built-in arrays. Other languages support other data models. Pascal, for example, has a built-in set type, and Lisp supports lists (hence its name). C++ inherited from C its support for arrays. Arrays have several properties that more or less correspond to the mathematical notion of a vector: They can store any data type and they provide random access that is, the time needed to access any element is identical, regardless of the element's position.

Still, under some circumstances, arrays are less convenient than other data models; it is impossible to insert a new element in the middle of an array. Also, you cannot append new elements to the end of an array. With other data models, (a list, for example), it is possible to insert new elements in the middle of the container or to append elements to its end. A special type of list, a heterogenic list, can hold elements of different types at the same time.

Sequence Containers

A sequence container organizes a collection of objects of the same type T into a strictly linear arrangement. Following are examples of sequence containers:

Interestingly, built-in arrays are considered sequence containers because STL algorithms are designed to work with them as they work with other sequence types.

Requirements for STL Containment

Elements of STL containers must be copy-constructible and assignable. Essentially, copy-constructible means that an object and a copy of that object must be identical (although the formal definition in the Standard is somewhat more complicated than that). Likewise, assignable means that assigning one object to another results in two identical objects. These definitions might sound trivial because objects in general are copy-constructible and assignable; later, however (when class auto_ptr is discussed), you will see an example of an object that does not meet these requirements and is, therefore, not to be stored in STL containers.

An additional requirement is that container elements must have their copy constructor, default constructor, assignment operator, and destructor publicly declared (either explicitly or implicitly).

The vector Container Class

The standard containers share a common interface, but each container also defines particular operations. Following is the interface of class vector<T>:

namespace std {       template <class T, class Allocator = allocator<T> >
       class vector {
       public:
         // implementation-defined types
         typedef implementation defined                iterator;
         typedef implementation defined                const_iterator;
         typedef implementation defined                size_type;
         typedef implementation defined                difference_type;
         // additional types
         typedef typename Allocator::reference         reference;
         typedef typename Allocator::const_reference   const_reference;
         typedef T  value_type;
         typedef Allocator                             allocator_type;
         typedef typename Allocator::pointer           pointer;
         typedef typename Allocator::const_pointer     const_pointer
         typedef std::reverse_iterator<iterator>       reverse_iterator;
         typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
         // construction, copying destruction and assignment operations
         explicit vector(const Allocator& = Allocator());
         explicit vector(size_type n, const T& value = T(),
                              const Allocator& = Allocator());
         template <class InputIterator>
           vector(InputIterator first, InputIterator last,
             const Allocator& = Allocator());
         vector(const vector<T,Allocator>& x);
        ~vector();
         vector<T,Allocator>& operator=(const vector<T,Allocator>& x);
         template <class InputIterator>
           void assign(InputIterator first, InputIterator last);
         void assign(size_type n, const T& u);
         allocator_type get_allocator() const;
         //iterators
         iterator               begin();
         const_iterator         begin() const;
         iterator               end();
         const_iterator         end() const;
         reverse_iterator       rbegin();
         const_reverse_iterator rbegin() const;
         reverse_iterator       rend();
         const_reverse_iterator rend() const;
         //capacity operations
         size_type size() const;
         size_type max_size() const;
         void      resize(size_type sz, T c = T());
         size_type capacity() const;
         bool      empty() const;
         void      reserve(size_type n);
         //element access operations
         reference       operator[](size_type n);
         const_reference operator[](size_type n) const;
         const_reference at(size_type n) const;
         reference       at(size_type n);
         reference       front();
         const_reference front() const;
         reference       back();
         const_reference back() const;
         // modifiers
         void push_back(const T& x);
         void pop_back();
         iterator insert(iterator position, const T& x);
         void     insert(iterator position, size_type n, const T& x);
         template <class InputIterator>
             void insert(iterator position,
                         InputIterator first, InputIterator last);
         iterator erase(iterator position);
         iterator erase(iterator first, iterator last);
         void     swap(vector<T,Allocator>&);
         void     clear();
       }; //class vector
       //non-member overloaded operators
       template <class T, class Allocator>
         bool operator==(const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       template <class T, class Allocator>
         bool operator< (const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       template <class T, class Allocator>
         bool operator!=(const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       template <class T, class Allocator>
         bool operator> (const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       template <class T, class Allocator>
         bool operator>=(const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       template <class T, class Allocator>
         bool operator<=(const vector<T,Allocator>& x,
                         const vector<T,Allocator>& y);
       //specialized algorithms
       template <class T, class Allocator>
         void swap(vector<T,Allocator>& x, vector<T,Allocator>& y);
     }//namespace std

On most implementations, the parameterized types size_type and difference_type have the default values size_t and ptrdiff_t, respectively. However, they can be replaced by other types for particular specializations.

The storage of STL containers automatically grows as necessary, freeing the programmer from this tedious and error-prone task. For example, a vector can be used to read an unknown number of elements from the keyboard:

#include <vector>
#include <iostream>
using namespace std;
int main()
{
  vector <int> vi;
  for (;;) //read numbers from a user's console until 0 is input
  {
    int temp;
    cout<<"enter a number; press 0 to terminate" <<endl;
    cin>>temp;
    if (temp == 0 ) break; //exit from loop?
    vi.push_back(temp); //insert int into the buffer
  }
  cout<< "you entered "<< vi.size() <<" elements" <<endl;
  return 0;
}//end main

Container Reallocation

The memory allocation scheme of STL containers must address two conflicting demands. On the one hand, a container should not preallocate large amounts of memory because it can impair the system's performance. On the other hand, it is inefficient to allow a container reallocate memory whenever it stores a few more elements. The allocation strategy has to walk a thin line. On many implementations, a container initially allocates a small memory buffer, which grows exponentially with every reallocation. Sometimes, however, it is possible to estimate in advance how many elements the container will have to store. In this case, the user can preallocate a sufficient amount of memory in advance so that the recurrent reallocation process can be avoided. Imagine a mail server of some Internet Service Provider: The server is almost idle at 4 a.m. At 9 a.m., however, it has to transfer thousands of emails every minute. The incoming emails are first stored in a vector before they are routed to other mail servers across the Web. Allowing the container to reallocate itself little by little with every few dozen emails can degrade performance.

What Happens During Reallocation?

The reallocation process consists of four steps. First, a new memory buffer that is large enough to store the container is allocated. Second, the existing elements are copied to the new memory location. Third, the destructors of the elements in their previous location are successively invoked. Finally, the original memory buffer is released. Obviously, reallocation is a costly operation. You can avert reallocation by calling the member function reserve(). reserve(n) ensures that the container reserves sufficient memory for at least n elements in advance, as in the following example:

class Message { /*...*/};
#include <vector>
using namespace std;
int FillWithMessages(vector<Message>& msg_que); //severe time constraints
int main()
{
  vector <Message> msgs;
  // before entering a time-critical section, make room for 1000 Messages
  msgs.reserve(1000);
//no re-allocation should occur before 1000 objects have been stored in vector
  FillWithMessages(msgs);
  return 0;
}

capacity() and size()

capacity() returns the total number of elements that the container can hold without requiring reallocation. size() returns the number of elements that are currently stored in the container. In other words, capacity() - size() is the number of available "free slots" that can be filled with additional elements without reallocating. The capacity of a container can be resized explicitly by calling either reserve() or resize(). These member functions differ in two respects. resize(n) allocates memory for n objects and default-initializes them (you can provide a different initializer value as the second optional argument).

reserve() allocates raw memory without initializing it. In addition, reserve() does not change the value that is returned from size() it only changes the value that is returned from capacity(). resize() changes both these values. For example

#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main()
{
  vector <string> vs;
  vs.reserve(10);  //make room for at least 10 more strings
  vs.push_back(string()); //insert an element   
  cout<<"size: "<< vs.size()<<endl; //output: 1
  cout<<"capacity: "<<vs.capacity()<<endl; //output: 10
  cout<<"there's room for "<<vs.capacity() - vs.size()
      <<" elements before reallocation"<<endl;
  //allocate 10 more elements, initialized each with string::string()
  vs.resize(20);
  cout<<"size: "<< vs.size()<<endl; //output 20
  cout<<"capacity: "<<vs.capacity()<<endl; //output 20;
  return 0;
}

Specifying the Container's Capacity During Construction

Up until now, the examples in this chapter have used explicit operations to preallocate storage by calling either reserve() or resize(). However, it is possible to specify the requested storage size during construction. For example

#include <vector>
using namespace std;
int main()
{
  vector<int>  vi(1000); //initial storage for 1000 int's
  //vi contains 1000 elements initialized by int::int()
  return 0;
}

Remember that reserve() allocates raw memory without initializing it. The constructor, on the other hand, initializes the allocated elements by invoking their default constructor. It is possible to specify a different initializer value, though:

vector<int>  vi(1000, 4); //initial  all 1000 int's with 4

Accessing a Single Element

The overloaded operator [] and the member function at() enable direct access to a vector's element. Both have a const and a non-const version, so they can be used to access an element of a const and a non-const vector, respectively.

The overloaded [] operator was designed to be as efficient as its built-in counterpart. Therefore, [] does not check to see if its argument actually refers to a valid element. The lack of runtime checks ensures the fastest access time (an operator [] call is usually inlined). However, using operator [] with an illegal subscript yields undefined behavior. When performance is paramount, and when the code is written carefully so that only legal subscripts are accessed, use the [] operator. The [] notation is also more readable and intuitive. Nonetheless, runtime checks are unavoidable in some circumstances for instance, when the subscript value is received from an external source such as a function, a database record, or a human operator. In such cases you should use the member function at() instead of operator []. at() performs range checking and, in case of an attempt to access an out of range member, it throws an exception of type std::out_of_range. Here is an example:

#include <vector>
#include <iostream>
#include <string>
#include <stdexcept>
using namespace std;
int main()
{
  vector<string> vs; // vs has no elements currently
  vs.push_back("string"); //add first element
  vs[0] = "overriding string"; //override it using []
  try
  {
    cout<< vs.at(10) <<endl; //out of range element, exception thrown
  }
  catch(std::out_of_range & except)
  {
    // handle out-of-range subscript
  }
}//end main

Front and Back Operations

Front and back operations refer to the beginning and the end of a container, respectively. The member function push_back() appends a single element to the end of the container. When the container has exhausted its free storage, it reallocates additional storage, and then appends the element. The member function pop_back() removes the last element from the container. The member functions front() and back() access a single element at the container's beginning and end, respectively. front() and back() both have a const and a non-const version. For example

#include <iostream>
#include <vector>
using namespace std;
int main()
{
  vector <short> v;
  v.push_back(5);
  v.push_back(10);
  cout<<"front: " << v.front() << endl; //5
  cout<<"back: " << v.back() << endl; //10
  v.pop_back(); //remove v[1]
  cout<<"back: " << v.back() << endl; //now 5
  return 0;
}

Container Assignment

STL containers overload the assignment operator, thereby allowing containers of the same type to be assigned easily. For example

#include <iostream>
#include<vector>
using namespace std;
int main()
{
  vector <int> vi;
  vi.push_back(1);
  vi.push_back(2);
  vector <int> new_vector;
  //copy the contents of vi to new_vector, which automatically grows as needed
  new_vector = vi;
  cout << new_vector[0] << new_vector[1] << endl;   // display 1 and 2
  return 0; 
} 

Contiguity of Vectors

Built-in arrays in C++ reside in contiguous chunks of memory. The Standard, however, does not require that vector elements occupy contiguous memory. When STL was added to the Standard, it seemed intuitive that vectors should store their elements contiguously, so contiguity never became an explicit requirement. Indeed, all current STL implementations follow this convention. The current specification, however, permits implementations that do not use contiguous memory. This loophole will probably be fixed by the Standardization committee in the future, and vector contiguity will become a Standard requirement.

A vector<Base> Should not Store Derived Objects

Each element in a vector must have the same size. Because a derived object can have additional members, its size might be larger than the size of its base class. Avoid storing a derived object in a vector<Base> because it can cause object slicing with undefined results. You can, however, achieve the desired polymorphic behavior by storing a pointer to a derived object in a vector<Base*>.

FIFO Data Models

In a queue data model (a queue is also called FIFO first in first out), the first element that is inserted is located at the topmost position, and any subsequent elements are located at lower positions. The two basic operations in a queue are pop() and push(). A push() operation inserts an element into the bottom of the queue. A pop() operation removes the element at the topmost position, which was the first to be inserted; consequently, the element that is located one position lower becomes the topmost element. The STL queue container can be used as follows:

#include <iostream>
#include <queue>
using namespace std;int main()
{
 queue <int> iq;
 iq.push(93); //insert the first element, it is the top-most one
 iq.push(250);
 iq.push(10); //last element inserted is located at the bottom
 cout<<"currently there are "<< iq.size() << " elements" << endl;
 while (!iq.empty() )
 {
   cout <<"the last element is: "<< iq.front() << endl; //front() returns 
                                                        //the top-most element
  iq.pop(); //remove the top-most element
 }
 return 0;
}

STL also defines a double-ended queue, or deque (pronounced "deck") container. A deque is a queue that is optimized to support operations at both ends efficiently. Another type of queue is a priority_queue. A priority_queue has all its elements internally sorted according to their priority. The element with the highest priority is located at the top. To qualify as an element of priority_queue, an object has to define the < operator (priority_queue is discussed in detail later, in the section titled "Function Objects").

Iterators

Iterators can be thought of as generic pointers. They are used to navigate a container without having to know the actual type of its elements. Several member functions such as begin() and end() return iterators that point to the ends of a container.

begin() and end()

All STL containers provide the begin() and end() pair of member functions. begin() returns an iterator that points to the first element of the container. For example

#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main()
{  vector <int> v(1);  //room for a single element
  v[0] = 10;
  vector<int>::iterator p  = v.begin();   // p points to the first element of v
  *p = 11; //assign a new value to v[0] through p
  cout << *p;  //output 11
  return 0;
}

The member function end(), on the other hand, returns an iterator that points one position past the last valid element of the container. This sounds surprising at first, but there's nothing really unusual about it if you consider how C-strings are represented: An additional null character is automatically appended one position past the final element of the char array. The additional element in STL has a similar role it indicates the end of the container. Having end() return an iterator that points one position past the container's elements is useful in for and while loops. For example

  vector <int> v(10);
  int n=0;
  for (vector<int>::iterator p = v.begin(); p<v.end(); p++)
    *p = n++;

begin() and end() come in two versions: const and non-const. The non-const version returns a non-const iterator, which enables the user to modify the values of the container's element, as you just saw. The const version returns a const iterator, which cannot modify its container.

For example

  const vector <char> v(10);
  vector<char>::iterator p  = v.begin(); //error, must use a const_iterator
  vector<char>::const_iterator cp  = v.begin(); //OK
  *cp = 'a'; //error, attempt to modify a const object
  cout << *cp;  //OK

The member functions rbegin() and rend() (reverse begin() and reverse end()) are similar to begin() and end(), except that they return reverse iterators, which apply to reverse sequences. Essentially, reverse iterators are ordinary iterators, except that they invert the semantics of the overloaded ++ and -- operators. They are useful when the elements of a container are accessed in reverse order.

For example

#include <iostream>
#include <vector>
#include <string>
using namespace std;
void ascending_order()
{
  vector <double> v(10);
  double d = 0.1;
  for (vector<double>::iterator p = v.begin(); p<v.end(); p++) //initialize
  {
    *p = d;
    d+= 0.1;
  }
   //display elements of v in ascending order
  for (vector<double>::reverse_iterator rp = v.rbegin(); rp < v.rend(); rp++)
  {
    cout<< *rp<<endl;
  }
}

Like begin() and end(), rbegin() and rend() have a const and a non-const version.

The Underlying Representation of Iterators

Most implementations of STL use pointers as the underlying representation of iterators. However, an iterator need not be a pointer, and there's a good reason for that. Consider a huge vector of scanned images that are stored on a 6GB disk; the built-in pointer on most machines has only 32 bits, which is not large enough to iterate through this large vector. Instead of a bare pointer, an implementation can use a 64-bit integer as the underlying iterator in this case. Likewise, a container that holds elements such as bits and nibbles (to which built-in pointers cannot refer) can be implemented with a different underlying type for iterators and still provide the same interface. However, bare pointers can sometimes be used to iterate through the elements of a container on certain implementations; for example

#include <vector>
#include <iostream>
using namespace std;
void hack()
{
  vector<int> vi;
  vi.push_back(5);
  int *p = vi.begin();//bad programming practice, although it may work
  *p = 6; //assign vi[0]
  cout<<vi[0]; //output 6 (maybe)
}

Using bare pointers instead of iterators is a bad programming practice avoid it.

"const Correctness" of Iterators

Use the const iterator of a container when the elements that are accessed through it are not to be modified. As with ordinary pointer types, using a non-const iterator implies that the contents of the container are to be changed. A const iterator enables the compiler to detect simple mistakes, and it is more readable.

Initializing a Vector with the Contents of a Built-in Array

As was previously noted, built-in arrays are valid sequence containers. Thus, the addresses of array ends can be used as iterators to initialize a vector with the contents of a built-in array. For example

#include<vector>
#include <iostream>
using namespace std;
int main()
{
  int arr[3];
  arr[0] = 4; arr[1] = 8;  arr[2] = 16;
   vector <int> vi ( &arr[0], //address of  the array's beginning
                     &arr[3] ); // must point one element past the array's end
  cout<< vi[0] << '\t' << vi[1] << '\t' << vi[2] <<endl;  // output: 4  8  16
  return 0;
}

Iterator Invalidation

Reallocation can occur when a member function modifies its container. Modifying member functions are reserve() and resize(), push_back() and pop_back(), erase(), clear(), insert(), and others. In addition, assignment operations and modifying algorithms can also cause reallocation. When a container reallocates its elements, their addresses change. Consequently, the values of existing iterators are invalidated.

For example

#include <iostream>
#include <list>
using namespace std;
int main()
{
  list <double> payroll;
  payroll.push_back(5000.00);
  list<double>::const_iterator p = payroll.begin(); //points to first element
  for (int i = 0 ; i < 10; i++)
  {
      payroll.push_back(4500.00); //insert 10 more elements to payroll; 
                                   //reallocation may occur
  }
      // DANGEROUS
  cout << "first element in payroll: "<< *p <<endl; // p may have 
                                                    //been invalidated
  return 0;
}

In the preceding example, payroll might have reallocated itself during the insertion of ten additional elements, thereby invalidating the value of p. Using an invalid iterator is similar to using a pointer with the address of a deleted object both result in undefined behavior. To be on the safe side, it is recommended that you reassign the iterator's value after calling a modifying member function. For example

list<double>::const_iterator p = payroll.begin();//points to the first element
for (int i = 0 ; i < 10; i++)
{
  payroll.push_back(4500.00); // reallocation may occur here
}
  p = payroll.begin(); // reassign p
  cout <<"first element in payroll: "<<*p<<endl;  // now safe
}

Alternatively, you can prevent reallocation by preallocating sufficient storage before the instantiation of an iterator. For example

int main()
{
  list <double> payroll;
  payroll.reserve(11);
  payroll.push_back(5000.00);
  list<double>::const_iterator p = payroll.begin();  
  for (int i = 0 ; i < 10; i++)
  {
    payroll.push_back(4500.00); //no reallocation
  }
  cout << "first element in payroll: "<< *p <<endl; // OK
  return 0;
}

Algorithms

STL defines a rich collection of generic algorithms that can be applied to containers and other sequences. There are three major categories of algorithms: non-modifying sequence operations, mutating sequence operations, and algorithms for sorting.

Non-Modifying Sequence Operations

Non-modifying sequence operations are algorithms that do not directly modify the sequence on which they operate. They include operations such as search, checking for equality, and counting.

The find() Algorithm

The generic algorithm find() locates an element within a sequence. find() takes three arguments. The first two are iterators that point to the beginning and the end of the sequence, respectively.

The third argument is the sought-after value. find() returns an iterator that points to the first element that is identical to the sought-after value. If find() cannot locate the requested value, it returns an iterator that points one element past the final element in the sequence (that is, it returns the same value as end() does). For example

#include <algorithm> // definition of find()
#include <list>
#include <iostream>
using namespace std;
int main()
{
  list<char> lc;
  lc.push_back('A');
  lc.push_back('T');
  lc.push_back('L');
  list<char>::iterator p = find(lc.begin(), lc.end(), 'A');   // find 'A'
  if (p != lc.end())     // was 'A' found?
     *p = 'S';    // then replace it with 'S'
  while (p != lc.end())   //display the modified list
    cout<<*p++;
  return 0;
}

Mutating Sequence Operations

Mutating sequence algorithms modify the sequence on which they operate. They include operations such as copy, fill, replace, and transform.

The copy() Algorithm

The Standard Library provides a generic copy function, which can be used to copy a sequence of objects to a specified target. The first and the second arguments of copy() are const iterators that mark the sequence's beginning and its end, respectively. The third argument points to a container into which the sequence is copied. The following example demonstrates how to copy the elements of a list into a vector:

#include <algorithm>
#include<list>
#include<vector>
using namespace std;
int main()
{
  list<int> li; vector <int> vi;
  li.push_back(1);
  li.push_back(2);
  vi.reserve( li.size() );  //must make room for copied elements in advance
  //copy list elements into vector, starting at vector's beginning
  copy (li.begin(), li.end(), vi.begin() );
  return 0;
}

Sort Operations

This category contains algorithms for sorting and merging sequences and set-like algorithms that operate on sorted sequences. These include sort(), partial_sort(), binary_search(), lower_bound(), and many others.

The sort() Algorithm

sort() takes two arguments of type const iterator that point to the beginning and the end of the sequence, respectively. An optional third algorithm is a predicate object, which alters the computation of sort (predicate objects and adaptors are discussed shortly). For example

#include <iostream>
#include <algorithm> //definition of sort()
#include <vector>
using namespace std;
int main()
{
  vector <int> vi;
  vi.push_back(7);
  vi.push_back(1);
  vi.push_back(19);
  sort(vi.begin(), vi.end() );   // sort vi; default is ascending order
  cout<< vi[0]  <<", "<< vi[1] <<", "<< vi[2] <<endl;  // output: 1, 7, 19
  return 0;
}

One way to force a descending order is to use reverse iterators:

sort(vi.rbegin(), vi.rend() ); // now sort in descending order
cout<< vi[0] <<", "<<vi[1]<<", "<<vi[2]<<endl; // output: 19, 7, 1

Requirements for Sorting Containers

When sort() operates on a container, it uses the relational operators == and < of the container's elements. User-defined types that do not support these operators can still be stored in a container, but such a container cannot be sorted.

Function Objects

It is customary to use a function pointer to invoke a callback routine. In an object-oriented environment, nonetheless, a function can be encapsulated in a function object (see also Chapter 3, "Operator Overloading"). There are several advantages to using a function object, or functor, instead of a pointer. Function objects are more resilient because the object that contains the function can be modified without affecting its users. In addition, compilers can inline a function object, which is nearly impossible when function pointers are used. But perhaps the most compelling argument in favor of function objects is their genericity a function object can embody a generic algorithm by means of a member template.

Implementation of Function Objects

A function object overloads the function call operator. A generic function object defines the overloaded function call operator as a member function template. Consequently, the object can be used like a function call. Remember that the overloaded operator () can have a varying number of arguments, and any return value. In the following example, a function object implements a generic negation operator:

#include <iostream>
#include <vector>
using namespace std;
class negate
{
public : //generic negation operator
  template < class T > T operator()  (T t) const { return -t;}
};
void callback(int n, const negate& neg) //pass a function object rather 
                                        //than a function pointer
{
  n = neg(n);  //invoke the overloaded () operator to negate n
  cout << n;
}
int main()
{
  callback(5, negate() ); //output: -5
  return 0;
}

Uses of Function Objects

Some container operations use function objects. For example, a priority_queue uses the less function object to sort its elements internally. The following example demonstrates a scheduler that stores tasks with different priorities in a priority_queue. Tasks that have higher priority are located at the top. Tasks with identical priority are located according to the order of their insertion, as in an ordinary queue:

#include <functional> // definition of less
#include <queue>  // definition of priority_queue
#include <iostream>
using namespace std;
struct Task
{
  int priority;
  friend bool operator < (const Task& t1, const Task& t2);
  Task(int p=0) : priority(p) {}
};
bool operator < (const Task& t1, const Task& t2)
{
  return t1.priority < t2.priority;
}
int main()
{
  priority_queue<Task> scheduler;
  scheduler.push(Task(3));
  scheduler.push(Task(5));
  scheduler.push(Task(1));
  scheduler.push(Task(1));
  cout<< scheduler.top().priority <<endl;   // output 5
  return 0;
}

Predicate Objects

A predicate is an expression that returns a Boolean value. Similarly, a function object that returns a Boolean value is a predicate object. STL defines several predicate objects that can be used to alter the computation of a generic algorithm. These predicate objects are defined in the header <functional>. In a previous example, you saw the operation of the algorithm sort(). The third argument of sort() is a predicate that alters the computation of this algorithm. For example, the predicate greater<int> can be used to override the default ascending order. Likewise, the predicate less<int> restores the original ascending order:

#include <functional> //definitions of STL predicates
#include <algorithm> //definition of sort
#include <vector>
#include <iostream>
using namespace std;
int main()
{
  vector <int> vi;
  vi.push_back(9);
  vi.push_back(5);
  vi.push_back(10);
  sort(vi.begin(), vi.end(), greater<int> () );  // descending order
  cout<< vi[0] << '\t' << vi[1] << '\t' << vi[2] <<endl;   // output: 10  9  5
  sort(vi.begin(), vi.end(), less<int> () );   // now in ascending order
  cout<< vi[0] << '\t' << vi[1] << '\t' << vi[2] <<endl;   // output: 5  9  10
  return 0;
}

Adaptors

An adaptor is a component that modifies the interface of another component. STL uses several types of adaptors: sequence adaptors, iterator adaptors, and function adaptors.

Sequence Adaptors

A sequence adaptor is a container that is built upon another container and that modifies its interface. For example, the container stack is usually implemented as a deque, whose non-stack operations are hidden. In addition, stack uses the operations back(), push_back(), and pop_back() of a deque to implement the operations top(), push(), and pop(), respectively. For example

#include <string>
#include <stack>
#include <iostream>
using namespace std;
int main()
{
  stack <string> strstack;
  strstack.push("Bjarne");
  strstack.push("Stroustrup");
  string topmost = strstack.top();
  cout<< "topmost element is: "<< topmost << endl; // "Stroustrup"
  strstack.pop();
  cout<< "topmost element is: "<< strstack.top() << endl; // "Bjarne"
  return 0;
}

Calling the member function pop() on an empty stack is an error. If you are not sure whether a stack contains any elements, you can use the member function empty() to check it first. For example

stack<int> stk;
//...many lines of code
if (!stk.empty() ) //test stack before popping it
{
    stk.pop();
}

Iterator Adaptors

The interface of an iterator can be altered by an iterator adaptor. The member functions rend() and rbegin() return reverse iterators, which are iterators that have the meanings of operators ++ and -- exchanged. Using a reverse iterator is more convenient in some computations.

Function Adaptors

Earlier you saw the use of greater as a function adaptor for changing the computation of sort(). STL also provides negators, which are used to reverse the result of certain Boolean operations. Binders are another type of adaptors, which convert a binary function object into a unary function object by binding an argument to a specific value.

Allocators

Every STL container uses an allocator that encapsulates the memory model that the program uses. Allocators hide the platform-dependent details such as the size of pointers, memory organization, reallocation model, and memory page size. Because a container can work with different allocator types, it can easily work in different environments simply by plugging a different allocator into it. An implementation provides a suitable allocator for every container. Normally, users should not override the default allocator.

Specialized Containers

Chapter 9, "Templates," discussed the benefits of defining template specializations to optimize and rectify the behavior of a primary template for a particular type. vector has a specialized form that manipulates Boolean values optimally, namely vector<bool>. This specialization is implemented in a way that squeezes each element into a single bit, rather than a bool variable, albeit with the familiar vector interface. For example

#include <vector>
#include <iostream>
using namespace std
void transmit(vector <bool> &binarystream)
{
  cout<<binarystream[0]; // subscript operator provided
  vector<bool>::const_iterator bit_iter = binarystream.begin(); //iterators
  if (binarystream[0] == true)
  {/* do something */ }
}

Associative Containers

An associative array is one for which the index need not be an integer. An associative array is also called map or dictionary. STL defines several associative containers. A map, for instance, stores pairs of values; one serves as the key, and the other is the associated value. The template pair<class Key, class Value> serves as a map element. In the following example, a map is used to translate the string value of an enumerator into its corresponding integral value. The string is the key whose associated value is an int:

#include <map>
#include <string>
#include <iostream>
using namespace std;
enum directions {up, down};
int main()
{
  pair<string, int> Enumerator(string("down"), down); //create a pair
  map<string, int> mi; //create a map
  mi.insert(Enumerator); //insert the pair
  int n = mi["down"]; //n = 1 //string used as subscript
  return 0;
}

A map can store only unique keys. A multimap is a map that can store duplicate keys.

set is similar to a map except that the associated values are irrelevant in this case. A set is used when only the keys are important: to ensure that a database transaction does not attempt to insert a record with a unique key that already exists in a table, for example. multiset is a set that allows duplicate keys.

Class auto_ptr

The class template auto_ptr implements the "resource acquisition is initialization" idiom (discussed in Chapter 5, "Object-Oriented Programming Design"). It is initialized by a pointer to an object allocated on the free store (auto_ptr has a default constructor so you can instantiate an empty auto_ptr and assign a pointer to it later). The destructor of auto_ptr destroys the object that is bound to the pointer. This technique can avoid memory leakage in the case of exceptions (see also Chapter 6, "Exception Handling"), or it can simplify programming by sparing the hassle of explicitly deleting every object allocated on the free store. Class auto_ptr is declared in the standard header <memory>. Following is an example of using auto_ptr (points of possible object destruction are numbered):

#include <memory>
using namespace std;
void f() { if (condition) throw "err";}
int main()
{
  try
  {
    auto_ptr<double> dptr(new double(0.0));
    *dptr = 0.5; //overloaded * provides pointer-like syntax
    f();  
  } // 1: no exception was thrown, dptr destroyed here
  catch(...)
  { // 2: an exception was thrown, dptr destroyed here
  }
  return 0;
}

It is guaranteed that the memory that was allocated from the free store is released: If f() throws an exception, the dptr object is destroyed during stack unwinding (2) and, as a result, the memory that was allocated from the free store is released. Otherwise, dptr is destroyed when the try block exits (1).

STL Containers Should not Store auto_ptr Elements

Elements of STL containers must be copy-constructible and assignable, as was noted previously. During reallocation, a container copy-constructs its elements in a new memory location and destroys the original elements by invoking their destructor. However, an auto_ptr is not copy-constructible. Rather, it provides strict ownership semantics, which means that it owns the object to which it holds a pointer (ownership is also discussed in Chapter 5). Copying an auto_ptr object copies that pointer and transfers ownership to the destination. This stands in contrast to the copy-constructible and assignable requirements: One copy of auto_ptr holds a pointer to the free store object, whereas the other copy doesn't. (If more than one auto_ptr owns the same object at the same time, the results are undefined.) Therefore, auto_ptr objects are not to be stored in STL containers.

Nearly Containers

STL defines three additional components that behave, in many ways, like ordinary containers. They have automatic memory management, they have iterators, and they share a container-like interface with member functions such as begin() and end(). Still, they are not considered "first-class citizens" in the STL catalog because they are not generic. A string is similar to vector but is confined to char data type. valarray resembles vector, albeit with a strong bias toward numerical computations. The third class in this category is bitset, which is a set designed to store and manipulate bits in an efficient way. These nearly containers have a limited use for general purposes, except for string.

Class string

std::string is a shorthand for std::basic_string<char>, as you saw in Chapter 9. string provides many of the operations of ordinary STL containers; for example, it conforms to the requirements of a sequence and it defines iterators. However, string is optimized for the use of a character string exclusively.

The consideration in the design of string included utmost efficiency, support for C-strings, and generality (that is, string is not targeted for a particular application use).

Constructors

string has a default constructor and five more constructors:

namespace std
{
template<class charT, class traits = char_traits<charT>,
              class Allocator = allocator<charT> >
  class basic_string {
  public:
  //...
    explicit basic_string(const Allocator& a = Allocator());
    basic_string(const basic_string& str, size_type pos = 0,
                 size_type n = npos, const Allocator& a = Allocator());
    basic_string(const charT* s,
                 size_type n, const Allocator& a = Allocator());
    basic_string(const charT* s, const Allocator& a = Allocator());
    basic_string(size_type n, charT c, const Allocator& a = Allocator());
    template<class InputIterator>
    basic_string(InputIterator begin, InputIterator end,
                 const Allocator& a = Allocator());
    //...
    };
}

In other words, a string can be initialized by a C-string, by another string object, by part of a C-string, by a sequence of characters, or by part of another string. Following are some examples:

#include <string>
using namespace std;
void f()
{
  const char text[] = "hello world";
  string s = text;  //initialization of string object with a C-style string
  string s2(s);  //copy construction
  string s3(&text[0], &text[5]); // part of a C-string; s3 = "hello"
  string s4(10, 0); //a sequence of zero initialized characters 
  string s5 ( s2.begin(), s2.find(' ')); //initialized part of another string
                                         //s5 = "hello"
}

It is important to note that when the initializer is a pointer to char, string does not check the pointer. It is the programmer's responsibility to ensure that the pointer is valid and that it is not NULL. Otherwise, the results are undefined. For example

#include<string>
using std::string;
const char * getDescription(int symbol); // may return a NULL pointer
string& writeToString  (int symbol)
{
  // sloppy: initializer might be NULL; undefined behavior in this case
  string *p = new string(getDescription(symbol));
  return *p;
}

string does not check for a NULL pointer to avoid the incurred performance overhead. Remember that standard C functions such as strcpy() avoid this overhead too. Even if string did check the char pointer, it is unclear what it is to do in the case of a NULL value. Clearly, a NULL pointer is not a valid C-string, so creating an empty string is incorrect. Throwing an exception is perhaps a plausible approach, but it incurs additional runtime overhead and is not always desirable.

A safer implementation of the preceding example might check the initializing pointer to make sure that it is not NULL:

string& writeToString (int symbol)
{
  const char *p = getDescription(symbol);
  if (p) // now safe
  {
    string *pstr = new string(p);
    return *pstr;
  }
  return *new string;
}

Conversion to a C-string

Class string provides two member functions that return the const char * representation of its object. The following sections discuss these member functions.

The c_str() Member Function

string does not define a char * conversion operator. There are two reasons for this. First, implicit conversions can cause undesirable surprises when you least expect it (refer to Chapter 3). Another reason is that C-strings must be null-terminated. The underlying representation of a string object is implementation-dependent and might not use a null-terminated sequence. Therefore, an implicit conversion of a string object in a context that requires a null-terminated array of characters can be disastrous. For these reasons, string does not provide such a conversion operator. Instead, an explicit call to string::c_str() is required. c_str() returns the const char * representation of its object. For example

void f()
{
  string  s = "Hello";
  if( strcmp( s.c_str(), "Hello")== 0)
   cout <<"identical"<<endl;
 else
   cout<<"different"<<endl;
}

The pointer that is returned from c_str () is owned by the string object. The user should not attempt to delete it or to modify its associated char array. The returned pointer is not to be used after a non-const member function has been called.

The data() Member Function

The member function data() also returns a const char * representation of its object (but the resultant array might not be null-terminated).

Accessing a Single Element

There are two ways to access a single character from a string object. One is to use the overloaded operator [], as in the following example:

#include <string>
using namespace std;
void  first()
{
  string s = "hello world";
  char c = s[0];  //assign  'h'
}

Another way is to use the member function at(). Similar to class vector, string::at() performs range-checking and throws an exception of type std::out_of_range when an attempt is made to access an out-of-range character.

Clearing the Contents of A string

To explicitly erase the contents of a string, you can use the member function erase(). For example

#include <iostream>
#include <string>
using namespace std;
void f()
{
  char key;
  string msg = "press any key to continue";
  cout<<msg<<endl;
  cin<<key;
  msg.erase(); //clear msg
}

Comparison

string defines three versions of operator ==:

  bool operator == (const string& left, const string right);
  bool operator == (const char* left, const string right);
  bool operator == (const string& left, const char* right);

This proliferation might seem redundant because string has a constructor that automatically converts a const char * to a string object. Therefore, only the first version of operator == is necessary. However, the overhead of creating a temporary string can be unacceptable under some circumstances: The temporary string has to allocate memory on the free store, copy the C-string, and then release the allocated memory. The Standardization committee's intent was to make comparison of strings as efficient as possible. Therefore, the additional versions of operator == were added to enable efficient comparisons.

Additional Overloaded Operators

As was previously noted, a string can be assigned another string, a C-string, or a single character. Similarly, there are three versions of the overloaded operator += that support concatenation of another string, a C-string, or a single character to an existing string. For example

#include <string>
using namespace std;
void f()
{
  string s1 = "ab"
  string s2= "cd";
  s1+=s2;
  s1+= "ef";
  s1+='g';
}

string also defines an overloaded + that returns a string that concatenates its operands. Similarly, the operators < and > perform a lexicographical comparison between their operands.

Performance Issues

string is probably the most widely used class in C++ programs. The efficiency of its design and implementation were cardinal. For example, string provides optimized container operations such as find(), copy(), and replace() that are specifically designed to manipulate characters efficiently. In some respects, string objects are even more efficient than char * in terms of speed and space. The following sections discuss two aspects of string's performance: size computation and reference counting.

Size

string has a data member that holds its size. Calculating the size of a string object is, therefore, a fast constant time operation. On the other hand, the performance of strlen() is proportional to the number of characters that are stored in a C-string. When large strings are used and size computations are frequent, std::string is more efficient than a C-string.

Reference Counting

In a nutshell, a reference counted model counts how many instances of a class have an identical state. When two or more instances share the same state, the implementation creates only a single copy and counts the number of existing references to this copy. For example, an array of strings can be represented as a single string object that holds the number of elements in the array (reference counting is not confined to arrays). Since initially the array elements share the same state (they all are empty strings), only a single object is needed. When one of the array elements changes its state (if it is assigned a different value, for instance), the existing object creates one more object this is called "copy on write". As you can see, the reference counting model can enhance performance in terms of both memory usage and speed. The Standard's specification of class string is formulated to allow but it does not require a reference counted implementation.

A reference counted implementation must have the same semantics as a non-reference counted implementation. For example

string str1("xyz");
string::iterator i = str1.begin();
string str2 = str1;
*i = 'w';  //must modify only  str1

Conclusions

STL was designed to allow maximal reusability without sacrificing efficiency. The Standard specifies performance requirements with which STL containers and algorithms must comply. These performance specifications are the minimum requirements; an implementation might offer better performance.

The plug compatibility of STL components, which enables the user to create other components or to modify the interface of existing components, is remarkable. Other frameworks and libraries impose severe constraints on the use of their components, considerably limiting their plug-compatibility.

STL is regarded by C++ creators as the most important addition to the language in recent years. Mastering STL is a worthwhile investment. It is estimated that other programming languages will follow the role model of STL and provide similar generic frameworks. Three major advantages of preferring STL to homemade containers are


Contents


© Copyright 1999, Macmillan Computer Publishing. All rights reserved.