Chapter 19: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.

Please state the document version you're referring to, as found in the title (in this document: 5.2.0a) and please state the paragraph you're referring to.

All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Also, with the advent of the ANSI/ISO standard, classes supporting streams based on file descriptors are no longer available, including the Gnu procbuf extension. These classes were frequently used in older C++ programs. This section of the C++ Annotations develops an alternative: classes extending streambuf, allowing the use of file descriptors, and classes around the fork() system call.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

19.1: `streambuf' classes using file descriptors

19.1.1: A class for output operations

Usually extensions to the ANSI/ISO standard are available allowing us to read from and/or write to file descriptors. However, these extensions are not standard, and may thus vary across compilers. As a file descriptor can be considered a device, it seems natural to use the class streambuf as the starting point for the construction of classes interfacing file descriptors. In this section we will discuss the construction of a class which may be used to write to a device defined by its file descriptor: it may be a file, but could be a pipe or socket. Section 19.1.2 will discuss the matter of reading from devices that are known by their file descriptors, while section 19.3 will reconsider the question of redirection, which was discussed earlier in section 5.8.3.

Basically, deriving a class for output operations is very simple. The only member function that must be overridden is the virtual int overflow(int c) member. This member is responsible for writing characters to the device in the case where there is no buffer or no more room in the buffer. If fd represents the file descriptor to write to, and if we decide against using a buffer, the member overflow() can be as simple as:

    class UnbufferedFD: public std::streambuf
    {
        public:
            int overflow(int c)
            {
                if (c != EOF)
                {
                    if (write(fd, &static_cast<char>(c), 1) != 1)
                        return EOF;
                }
                return c;
            }
            ...
    }
The reveived argument is either written as char to the file descriptor, or EOF is returned. However, this approach does not use an output buffer. As the use of a buffer is strongly advised (see also the next section), we will discuss the construction of a class using an output buffer is somewhat greater detail.

When an output buffer is to be used, the overflow() member gets a bit more complex, due to the fact that it is now called only when the buffer is full. So, first we will have to flush the buffer, for which the (virtual) function streambuf::sync() should be used. Defining sync() and using it in overflow() is not all that must be done: the information to write might be less than the size of the buffer. So, at the end of the lifetime of our special streambuf object, it might find itself having a buffer that is only partially full. So, we must make sure that the buffer is flushed by the time our object goes out of scope. This is of course very simple: sync() should be called in the destructor as well.

Now that we've considered the consequences of using an output buffer, we're almost ready to construct our derived class. We add a couple of extra features. First, we permit the specification of the size of the output buffer by the user of the class. Second, we permit the construction of an object of the class before the file descriptor to use is actually known. Later, in section 19.4 we'll encounter a situation where this feature will be used. In order to save some space, the successful operation of various calls were not checked. In `real life' implementations these checks should of course not be omitted. Here is the class ofdnstreambuf:

    #ifndef _OFDNSTREAMBUF_H_
    #define _OFDNSTREAMBUF_H_
    
    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ofdnstreambuf: public std::streambuf
    {
        unsigned d_bufsize;
        int     d_fd;
        char    *d_buffer;

        public:
            ofdnstreambuf()
            :
                d_bufsize(0),
                d_buffer(0)
            {}
    
            ofdnstreambuf(int fd, unsigned bufsize = 1)
            {
                open(fd, bufsize);
            }
    
            ~ofdnstreambuf()
            {
                if (d_buffer)
                {
                    sync();                                     // 1
                    delete d_buffer;
                }
            }
    
            void open(int fd, unsigned bufsize = 1)
            {
                d_fd = fd;
                d_bufsize = bufsize == 0 ? 1 : bufsize;
            
                d_buffer = new char[d_bufsize];
                setp(d_buffer, d_buffer + d_bufsize);           // 2
            }
    
            int sync()
            {
                if (pptr() > pbase())
                {
                    write(d_fd, d_buffer, pptr() - pbase());    // 3
                    setp(d_buffer, d_buffer + d_bufsize);       // 4
                }
                return 0;
            }
    
            int overflow(int c)
            {
                sync();                                         // 5
                if (c != EOF)
                {
                    *pptr() = static_cast<char>(c);             // 6
                    pbump(1);
                }
                return c;
            }
    };

    #endif
Depending on the number of arguments, the following program uses the ofdstreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output. Here is the program:
#include "fdout.h"
#include <string>
#include <iostream>
#include <istream>

using namespace std;

int main(int argc)
{
    ofdnstreambuf   fds(STDOUT_FILENO, 500);
    ostream         os(&fds);

    switch (argc)
    {
        case 1:
            os << "COPYING cin LINE BY LINE\n";
            for (string  s; getline(cin, s); )
                os << s << endl;
        break;        

        case 2:
            os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";

            cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
        break;

        case 3:    
            os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
            os << cin.rdbuf();
        break;
    }
}

19.1.2: Classes for input operations

When a class performing input operations must be derived from the class streambuf, the class should use an input buffer of at least one character, to allow the use of the member functions istream::putback() or istream::ungetc(). Stream classes (like istream) normally allow the ungetting of at least one character using their member functions putback() or ungetc(). This is important, as these stream classes usually interface to streambuf objects. Although strictly speaking it is not necessary to implement a buffer in classes derived from streambuf, we strongly suggest to use buffers is these cases: the implementation is very simple and straightforward, and the applicability of such classes is greatly improved. Therefore, in all our classes derived from the class streambuf at least a buffer of one character will be defined.

19.1.2.1: Using a one-character buffer

When deriving a class (e.g., ifdstreambuf) from streambuf using a buffer of at least one character, at the minimum the member streambuf::underflow() should be overridden, as this is the member to which all requests for input eventually degenerate. Since also a buffer is needed, the member streambuf::setg() is used to inform the streambuf baseclass of the dimensions of the input buffer, so that it is able to set up its input buffer pointers so that eback(), gptr(), and egptr() return correct values. Here is the corresponding class definition:

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ifdstreambuf: public std::streambuf
    {
        protected:                                              // 1
            int     d_fd;
            char    d_buffer[1];
    
        public:
            ifdstreambuf(int fd)
            :
                d_fd(fd)
            {
                setg(d_buffer, d_buffer + 1, d_buffer + 1);     // 2
            }
    
            int 
            underflow() 
            {
                if (gptr() < egptr())                           // 3
                    return *gptr();                             // 4
        
                if (read(d_fd, d_buffer, 1) <= 0)               // 5
                    return EOF;
        
                setg(d_buffer, d_buffer, d_buffer + 1);         // 6
                return *gptr();
            }
    };
Please note the following: The above ifdstreambuf class is used in the following program:
    #include "ifdbuf.h"
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdstreambuf
            fds(0);
        istream
            is(&fds);
    
        cout << is.rdbuf();
    }        

19.1.2.2: Using an n-character buffer

How complex would things get if we decide to use a buffer of substantial size? Not that complex. The following class allows use to specify the size of a buffer, and it is basically the same class as ifdstreambuf from the previous section. To make things a bit more interesting, in the current class ifdnstreambuf the member streambuf::xsgetn() was also overridden, to optimize the reading of series of characters at once. Also, a default constructor is provided which can be used in combination with the added open() member in cases where we want to construct a istream object before we know the file descriptor that must be used. Later, in section 19.4 we'll encounter such a situation. In order to save some space, the successful operation of various calls were not checked. In `real life' implementations these checks should of course not be omitted. Here is the new class ifdnstreambuf:

    #ifndef _IFDNBUF_H_
    #define _IFDNBUF_H_

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ifdnstreambuf: public std::streambuf
    {
        protected:
            int         d_fd;
            unsigned    d_bufsize;
            char*       d_buffer;
        public:
            ifdnstreambuf()
            :
                d_bufsize(0),
                d_buffer(0)
            {}

            ifdnstreambuf(int fd, unsigned bufsize = 1)
            {
                open(fd, bufsize);
            }

            ~ifdnstreambuf()
            {
                if (d_bufsize)
                {
                    close(d_fd);
                    delete d_buffer;                                // 1
                }
            }

            void open(int fd, unsigned bufsize = 1)
            {
                d_fd = fd;
                d_bufsize = bufsize;
                d_buffer = new char[d_bufsize];
                setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);
            }
    
            int underflow()
            {
                if (gptr() < egptr())
                    return *gptr();
    
                int nread = read(d_fd, d_buffer, d_bufsize);
    
                if (nread <= 0)
                    return EOF;
        
                setg(d_buffer, d_buffer, d_buffer + nread);         // 2
                return *gptr();
            }
    
            std::streamsize xsgetn(char *dest, std::streamsize n)   // 3
            {
                int nread = 0;
    
                while (n)
                {
                    if (!in_avail())
                    {
                        if (underflow() == EOF)                     // 4
                            break;
                    }
    
                    int avail = in_avail();
    
                    if (avail > n)
                        avail = n;
    
                    memcpy(dest + nread, gptr(), avail);            // 5
                    gbump(avail);                                   // 6
    
                    nread += avail;
                    n -= avail;
                }
    
                return nread;
            }
    };

    #endif
With this class, please note: The member function xsgetn() is called by streambuf::sgetn(), which is a streambuf member. The following example illustrates the use of this member function with a ifdnstreambuf object:
    #include "ifdnbuf.h"
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdnstreambuf fds(0, 30);   // internally: 30 char buffer
        char buf[80];               // main() reads blocks of 80 
                                    // chars
        while (true)
        {
            unsigned n = fds.sgetn(buf, 80); 
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }        

19.1.2.3: Seeking positions in `streambuf' objects

When devices support seek operations, classes derived from streambuf should override streambuf::seekoff() and streambuf::seekpos(). In the following class ifdseek, which is derived from our previously constructed class ifdstreambuf, we assume that the device of which we have a file descriptor available suports these seeking operations. The class ifdseek is derived from ifdstreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class ifdseek will make sure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class ifdnstreambuf, in which case the arguments to reset the input buffer must be adapted such that its second and third parameter point beyond the available input buffer. Here is the class definition:

    #include "ifdbuf.h"
    #include <unistd.h>
    
    class ifdseek: public ifdstreambuf
    {
        typedef std::streambuf::pos_type        pos_type;               // 1
        typedef std::streambuf::off_type        off_type;
        typedef std::ios::seekdir               seekdir;
        typedef std::ios::openmode              openmode;               // 2
    
        public:
            ifdseek(int fd)                                             // 3
            :
                ifdstreambuf(fd)
            {}
    
            pos_type seekoff(off_type offset, seekdir dir, openmode)    // 4
            {
                pos_type pos = 
                    lseek
                    (
                        d_fd, offset, 
                        (dir ==  std::ios::beg) ? SEEK_SET :
                        (dir ==  std::ios::cur) ? SEEK_CUR :
                                                  SEEK_END
                    );
    
                if (pos < 0)
                    return -1;
    
                setg(d_buffer, d_buffer + 1, d_buffer + 1);             // 5
                return pos;
            }
    
            pos_type seekpos(pos_type offset, openmode mode)            // 6
            {
                return seekoff(offset, std::ios::beg, mode);
            }
    };
An example of a program using the class ifdseek is the following. If this program is given its own source file using input redirection then seeking is supported, and apart from the first line, every other line is shown twice:
    #include "fdinseek.h"
    #include <string>
    #include <iostream>
    #include <istream>
    #include <iomanip.h>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdseek fds(0);
        istream is(&fds);
        string  s;
    
        while (true)
        {
            if (!getline(is, s))
                break;
    
            streampos pos = is.tellg();
    
            cout << setw(5) << pos << ": `" << s << "'\n";
    
            if (!getline(is, s))
                break;
    
            streampos pos2 = is.tellg();
    
            cout << setw(5) << pos2 << ": `" << s << "'\n";
    
            if (!is.seekg(pos))
            {
                cout << "Seek failed\n";
                break;
            }
        }
    }        

buffer is refilled

19.1.2.4: Multiple `unget()' calls in `streambuf' objects

As stated earlier streambuf classes and classes derived from streambuf should at least support ungetting the character that was last read. Special care must be taken when series of unget() calls are to be supported. In this section we will show how to construct a class which is derived from streambuf and which allows a configurable number of istream::unget() or istream::putback() calls. Support for multiple (say `n') unget() calls is realized by setting aside an initial section of the input buffer, which is then gradually filled up by the last n characters read. Here is an implementation of a class supporting series of unget() calls:

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    #include <algorithm>
    
    class fdunget: public std::streambuf
    {
        int         d_fd;
        unsigned    d_bufsize;
        unsigned    d_reserved;
        char*       d_buffer;
        char*       d_base;

        public:
            fdunget (int fd, unsigned bufsz, unsigned unget)        // 1
            :
                d_fd(fd),
                d_reserved(unget)                                   // 2
            {
                unsigned allocate =                                 // 3
                        bufsz > d_reserved ?
                            bufsz
                        :
                            d_reserved + 1;
    
                d_buffer = new char [allocate];                     // 4
    
                d_base = d_buffer + d_reserved;                     // 5
                setg(d_base, d_base, d_base);                       // 6
    
                d_bufsize = allocate - d_reserved;                  // 7
            }
    
            ~fdunget()
            {
                delete d_buffer;
            }
    
            int 
            underflow()             
            {
                if (gptr() < egptr())
                    return *gptr();                             
    
                unsigned consumed =                                 // 8
                    static_cast<unsigned>(gptr() - eback());    
    
                unsigned move = std::min(consumed, d_reserved);     // 9
    
                memcpy(d_base - move, egptr() - move, move);        // 10
    
                int nread = read(d_fd, d_base, d_bufsize);          // 11
                if (nread <= 0)       // none read -> 
                    return EOF;
        
                setg(d_base - move, d_base, d_base + nread);        // 12
    
                return *gptr();
            }
    };
Here is an example of a program illustrating the use of the class fdunget. The program reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined, in a buffer of 3 characters. Just before reading a character it is tried to unget at most 6 characters. This is of course never possible, but the program will neatly unget as many characters as possible, considering the actual number of characters read:
    #include "fdunget.h"
    #include <string>
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        fdunget
            fds(0, 3, 2);
        istream
            is(&fds);
        char
            c;
    
        for (int idx = 0; idx < 10; ++idx)
        {
            cout << "after reading " << idx << " characters:\n";
            for (int ug = 0; ug <= 6; ++ug)
            {
                if (!is.unget())
                {
                    cout 
                    << "\tunget failed at attempt " << (ug + 1) << "\n" 
                    << "\trereading: '";
    
                    is.clear();
                    while (ug--)
                    {
                        is.get(c);
                        cout << c;
                    }
                    cout << "'\n";
                    break;
                }
            }
    
            if (!is.get(c))
            {
                cout << " reached\n";
                break;
            }    
            cout << "Next character: " << c << endl;
        }
    }        
    /*
        Generated output after 'echo abcde | program':

        after reading 0 characters:
                unget failed at attempt 1
                rereading: ''
        Next character: a
        after reading 1 characters:
                unget failed at attempt 2
                rereading: 'a'
        Next character: b
        after reading 2 characters:
                unget failed at attempt 3
                rereading: 'ab'
        Next character: c
        after reading 3 characters:
                unget failed at attempt 4
                rereading: 'abc'
        Next character: d
        after reading 4 characters:
                unget failed at attempt 4
                rereading: 'bcd'
        Next character: e
        after reading 5 characters:
                unget failed at attempt 4
                rereading: 'cde'
        Next character: 
        
        after reading 6 characters:
                unget failed at attempt 4
                rereading: 'de
        '
         reached
    */

19.2: Using form() with ostream objects

In the ANSI/ISO standard does not include the previously available Gnu extension form() (and scan()) for stream objects. In this section the construction of a class oformstream is described, which is derived from ostream and does support form() and vform(). Analogously to oformstream a class iscanstream can be developed, supporting scan() and vscan(). The contruction of this latter class is left as an exercise to the reader.

Here is the class interface and definition. It is annotated below:

    #include <iostream>
    
    class oformstream: public std::ostream              // 1
    {
        public:
            oformstream(std::ostream &ostr)             // 2
            :
                std::ostream(ostr.rdbuf())
            {}

            std::ostream &form(char const *fmt, ...);   // 3
            
    };

A program using the class oformstream is given in the next example. It prints a well-known string and some marker text:

    #include "oformstream.h"
    #include <memory>
    #include <stdarg.h>

    std::ostream &oformstream::form(char const *fmt, ...)   // 1
    {
        va_list list;
        va_start(list, fmt);

        unsigned n = vsnprintf(0, 0, fmt, list);            // 2
        std::auto_ptr<char> buf(new char[n + 1]);           // 3
        vsprintf(buf.get(), fmt, list);                     // 4

        return *this << buf.get();                          // 5
    }

    using namespace std;

    int main()
    {
        oformstream
            of(cout);

        of << "Hello world\n";
        cout << "Ok, ";
        of << "That's all, folks\n";

        of.form("%s\n", "Hello world of C++") << endl;
    };

/*
    Generated output:

    Hello world
    Ok, That's all, folks
    Hello world of C++
*/

In the above example:

19.3: Redirection revisited

Earlier, in section 5.8.3 it was noted that within a C++ program streams could be redirected using the ios::rdbuf() member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus realizing redirection at the level of the programming language itself.

It should be realized that this is fine within the context of the C++ program, but if that context is left, the redirection is terminated, as the operating system does not know about streambuf objects. This happens, e.g., when a program uses a system() call to start a subprogram. The following program uses C++ redirection to redirect the information inserted into cout to a file, and then calls

    system("echo hello world")
to echo a well-known line of text. Since echo writes its information to the standard output this would be the program's redirected file if C++'s redirection would be recognized by the operating system. However, this is not the case, so hello world still appears at the program's standard output and not on the redirected file. A solution of this problem involves redirection at the operating system level, for which system calls like dup() and dup2() are available in some operating systems (e.g., Unix and friends). Examples of the use of these system calls are given in section 19.4 in this chapter.

Here is the example of the failing redirection at the system level following C++ redirection using streambuf redirection:

    #include <iostream>
    #include <fstream>
    #include <cstdlib>

    using namespace::std;

    int main()
    {
        ofstream
            of("outfile");

        cout.rdbuf(of.rdbuf());
        cout << "To the of stream" << endl;
        system("echo hello world");
        cout << "To the of stream" << endl;
    }
    /*
        Generated output: on the file `outfile'

        To the of stream
        To the of stream
            On standard output:
        hello world
    */

19.4: The fork() system call

From the C prorgramming language, the fork() system call is well known. When a program needs to start a new process, system() can be used, but this requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to call fork().

In this section we will see how C++ can be used to wrap classes around a complex system call like fork(). Much of what follows in this section directly applies to the Unix operating system, and the discussion will therefore focus on that operating system. However, other systems usually provide comparable facilities. The following discussion is based heavily on the notion of design patterns, as published by Gamma et al. (1995)

When fork() is called, the current program is duplicated in memory, thus creating a new process, and both processes continue their execution just below the fork() system call. The two processes may, however, inspect the return value of fork(): the return value in the original process (called the parent process) differs from the return value in the newly created process (called the child process):

A basic Fork class should hide the bookkeeping details of a system call like fork() from its users. The Fork class developed below will do just that. The class itself only needs to take care of the proper execution of the fork() system call. Normally, fork() is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process is. However, a Fork object should be able to activate the child process.

Unfortunately, it is not known to the constructor of the Fork class what actions the child process should perform. Similarly, it is not known what actions the parent process should perform. For this particular situation, the template method design pattern was developed. According to Gamma c.s., the template method design pattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define a base class in which the essential steps of the fork() system call have already been implemented, deferring the implementation of parts of the normal use of the fork() system call to subclasses.

In particular, the Fork base class will implement:

These two functions are the only parts of the public interface of the class. All its remaining members can be used only by derived classes. Already implemented are:

Then, the following essential steps of the normal use of fork() are left to be implemented by derived classes:

The above two members must be implemented by derived classes. In addition, the following two members may be implemented by derived classes: Redirection of the standard streams will be necessary if the child- and parent processes intend to communicate with each other via the standard streams.

The header file implementing the Fork base class can now be constructed. It is:

    class Fork
    {
        int d_pid;

        public:
            virtual ~Fork()
            {}

            void fork();

        protected:
            int getPid()
            {
                return d_pid;
            }

            virtual void childRedirections()    // do child redirections
            {}            
            virtual void childProcess() = 0;    // must be implemented
            

            virtual void parentRedirections()   // do parent redirections
            {}            
            virtual void parentProcess() = 0;   // must be implemented

            int waitForChild();                 // returns the status
    };
The implementation of the member function fork() provides for the calling of the system function fork() (Caution: since the system function fork() is called in the member function having the same name, the :: scope resolution operator must be used to prevent a recursive call of the member function itself), whereafter, depending on the return value of ::fork(), either the parentProcess() or the childProcess() is called. Redirection of the standard streams may be necessary. The redirection are to be defined in childRedirections() and parentRedirections():
    #include "fork.ih"
    
    void Fork::fork()
    {
        if ((d_pid = ::fork()) < 0)
            throw "Fork::fork() failed";
    
        if (d_pid == 0)                 // childprocess has pid == 0
        {
            childRedirections();
            childProcess();
    
            exit (1);                   // we shouldn't come here: 
                                        // childProcess should exit
        }
    
        parentRedirections();
        parentProcess();
    }
In fork.cc the class' internal header file fork.ih is included. This header file takes care of the inclusion of the necessary system header files, as well as the inclusion of fork.h itself. Its implementation (omitting the redundant include guards) is:
    #include "fork.h"
    
    #include <cstdlib>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>

Normally, the child process does not return: once it has completed its task it should terminate as a separate process. The parent process usually waits for its child process to complete. The terminating child process informs its parent that it is about to terminate by sending a signal to the parent, which should be caught by the parent. If the child terminates and the parent process does not catch the signal generated by the child process then the child process remains visible as a so-called zombi process. If the parent has to wait for the child, the member waitForChild() can be called in the implementation of the function parentProcess() to wait for the child's termination signal. Its implementation is:

    #include "fork.ih"
    
    int Fork::waitForChild()
    {
        int status;
    
        waitpid(d_pid, &status, 0);
    
        return WEXITSTATUS(status);
    }
This simple implementation basically returns the child's exit status to the parent. The called system function waitpid() blocks until the child terminates.

There is, however, a situation where the child process continues to live, but the parent dies. In nature this happens all the time: parents tend to die before their children do. In our context (i.e. C++), this represents a daemon program, where the parent process dies and the child program actually continues as a child of the basic init process. Again, when the child eventually dies a signal is sent to its ` step-parent' init. No zombi is created here, as init catches the termination signals of all its (step-) children.

The construction of a daemon process is so simple, that no special Fork object is needed (see section 19.4.1 for an example).

19.4.1: The `Daemon' program

Applications exist in which the only purpose of fork() is to start a child process. The parent process terminates immediately after spawning the child process. If this happens, the child process continues to run as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process.

Although the following example can easily be constructed as a plain C program, it was included in the C++ Annotations because it is so closely related to the current discussion of the Fork class. I thought about adding a daemon() member to that class, but eventually decided against it because the construction of a daemon program is very simple and uses none of the features currently offered by the Fork class.

Here is an example illustrating the construction of a daemon program:

    #include <iostream>
    #include <unistd.h>
    #include "fork.h"

    class Daemon: public Fork
    {
        public:
            virtual void parentProcess()        // the parent does nothing.
            {}

            virtual void childProcess()
            {
                sleep(3);                       // actions taken by the child
                                                // just a message...
                std::cout << "Hello from the child process\n";

                exit (0);                       // The child process should
            }                                   // exit here.
    };
                
    int main()
    {
        Daemon daemon;

        daemon.fork();          // program immediately returns
        return 0;
    }

    /*
        Generated output:
    The next command prompt, then after 3 seconds:
    Hello from the child process
    */

19.4.2: The `Pipe' class

In order to set up redirection at the system level, file descriptors are necessary. When two processes want to communicate using file descriptors, the following takes place: Though basically simple, errors may easily creep in: functions of descriptors given the processes (child- or parent-) may easily get mixed up. To prevent bookkeeping errors with the associated file descriptors, the bookkeeping may be tucked away in a class, like the Pipe class that is constructed here. The interface of the class is:
    class Pipe
    {
        private:
            enum    RW { READ, WRITE };
            int     d_fd[2];

            void redirect(int d_fd, int alternateFd);
    
        public:
            Pipe();

            int useForReading();
            void useForReadingFrom(int fileDescriptor);   

            int useForWriting();
            void useForWritingTo(int fileDescriptor);
            void useForWritingTo(int const *fileDescriptors, unsigned n = 2);
    };
The constructor constructs a set of associated file descriptors using the pipe() system call. The class has two sets of member functions: The member functions of the class Pipe were implemented as follows:
    #include "pipe.h"
    #include <unistd.h>

    Pipe::Pipe()
    {
        if (pipe(d_fd))
            throw "Pipe::Pipe(): pipe() failed";
    }

    void Pipe::redirect(int d_fd, int alternateFd)
    {
        if (dup2(d_fd, alternateFd) < 0)
            throw "Pipe: redirection failed";
    }

    int Pipe::useForReading()
    {
        close(d_fd[WRITE]);
        return d_fd[READ];
    }

    void Pipe::useForReadingFrom(int fd)
    {
        useForReading();
    
        redirect(d_fd[READ], fd);
        close(d_fd[READ]);
    }

    int Pipe::useForWriting()
    {
        close(d_fd[READ]);
        return d_fd[WRITE];
    }

    void Pipe::useForWritingTo(int const *fd, unsigned n)
    {
        useForWriting();
    
        for (int idx = 0; idx < n; idx++)
            redirect(d_fd[WRITE], fd[idx]);

        close(d_fd[WRITE]);
    }

    void Pipe::useForWritingTo(int fd)
    {
        useForWritingTo(&fd, 1);
    }

Pipe class and its derivatives IPipe and OPipe are constructed here. Objects of these classes may be used to construct sets of associated file descriptors.

Pipe defines and initializes the file descriptor array. Here is the definition of the class Pipe:

    class Pipe
    {
        private:
            enum    RW { READ, WRITE };
            int     d_fd[2];

            void redirect(int d_fd, int alternateFd);
    
        public:
            Pipe();

            int useForReading();
            void useForReadingFrom(int fileDescriptor);   

            int useForWriting();
            void useForWritingTo(int fileDescriptor);
            void useForWritingTo(int const *fileDescriptors, unsigned n = 2);
    };

19.4.3: The `ParentSlurp' class

Now that we've seen Fork and Pipe, it's time for an application. The class ParentSlurp, derived from Fork, is a class which starts a child process which execs a program (like /bin/ls). The (standard) output of the program is read by the parent, who will (for demonstration purposes) write the received lines to its standard output stream, while prepending linenumbers to the lines. It is most convenient here to redirect the parents standard input stream, so that the parent can read the output from the child process from its std::cin input stream.

In this case, the following members must be implementated:

Here is the interface of the ParentSlurp class:
    #include "fork.h"
    #include "pipe.h"

    #include <unistd.h>
    
    class ParentSlurp: public Fork
    {
        Pipe    d_pipe;

        public:
            ParentSlurp()
            {}

        protected:
            virtual void childRedirections()   
            {
                d_pipe.useForWritingTo(STDOUT_FILENO);
            }

            virtual void parentRedirections()  
            {
                d_pipe.useForReadingFrom(STDIN_FILENO);
            }

            virtual void childProcess()
            {
                execl("/bin/ls", "/bin/ls", 0);
            }

            virtual void parentProcess();
    };
And here is the implementation. The parentProcess() member repeatedly reads lines from its standard input, until all lines have been read. Then it waits for the child to complete. The main() function simply constructs a ParentSlurp object, and calls its fork() member. Output should consist of a numbered list of files in the directory where the program is started. An example is provided with the implementation given below. Note that the program also needs the fork.o, pipe.o and waitforchild.o object files (see the earlier sources):
    #include "parentslurp.h"

    #include <string>
    #include <iostream>

    void ParentSlurp::parentProcess()
    {
        std::string     line;
        unsigned int    nr = 1;

        while (getline(std::cin, line))
            std::cout << nr++ << ": " << line << std::endl;

        waitForChild();
    }

    int main()
    {
        ParentSlurp ps;

        ps.fork();
        return 0;
    }
    /*
        Generated Output (example only, actually obtained output may differ:

    1: a.out
    2: bitand.h
    3: bitfunctional
    4: bitnot.h
    5: daemon.cc
    6: fdinseek.cc
    7: fdinseek.h
    ...
    */

19.4.4: Communicating with multiple children

The next step up the ladder is the construct a child-process monitor. Here, the parent process is responsible for its child processes, but it also reads its standard input. The user may enter information at the standard input, and a simple command language is defined: Furthermore, the child process that hasn't received text for some time will complain, and will send a message to that effect to the parent-process. The parent process will simply transmit the received message to the user, by copying it to the standard output stream.

A problem with programs like these is that they allow input from multiple sources: input may appear from the standard input as well as from the read-side of the communication pipes. Also, multiple output channels are used. To handle situations like this, the select() system call was developed.

19.4.4.1: The `Select' class

The select() system call was developed to handle synchronous I/O multiplexing. This system call can be used to handle, e.g., input appearing synchroneously at a set of file descriptors.

Using the select() system call is rather complex, and its full discussion is beyond the scope of the C++ Annotations.

However, its use may be simplified by providing a class Selector, which hides away the details of using select(). The interface of such a class is:

    #ifndef _INCLUDED_SELECTOR_H_
    #define _INCLUDED_SELECTOR_H_

    #include <sys/time.h>
    #include <sys/types.h>
    #include <unistd.h>
    #include <limits.h>

    class Selector
    {
        fd_set          d_read;
        fd_set          d_write;
        fd_set          d_except;
        fd_set          d_ret_read;
        fd_set          d_ret_write;
        fd_set          d_ret_except;
        timeval         d_alarm;
        int             d_max;
        int             d_ret;
        int             d_readidx;
        int             d_writeidx;
        int             d_exceptidx;

        public:
            Selector();

            int wait();

            int nReady()
            {           
                return d_ret;
            }

            int getReadFd()
            {              
                return checkSet(&d_readidx, d_ret_read);
            }

            int getWriteFd()
            {               
                return checkSet(&d_writeidx, d_ret_write);
            }

            int getExceptFd()
            {                
                return checkSet(&d_exceptidx, d_ret_except);
            }

            
            void setAlarm(int sec, int usec = 0)
            {
                d_alarm.tv_sec  = sec;
                d_alarm.tv_usec = usec;
            }

            void noAlarm()
            {
                setAlarm(INT_MAX, INT_MAX);
            }

            void addReadFd(int fd)
            {
                addFd(&d_read, fd);
            }

            void addWriteFd(int fd)
            {
                addFd(&d_write, fd);
            }

            void addExceptFd(int fd)
            {
                addFd(&d_except, fd);
            }

            void rmReadFd(int fd)
            {
                FD_CLR(fd, &d_read);
            }

            void rmWriteFd(int fd)
            {
                FD_CLR(fd, &d_write);
            }

            void rmExceptFd(int fd)
            {
                FD_CLR(fd, &d_except);
            }

        private:
            int checkSet(int *index, fd_set &set);
            void addFd(fd_set *set, int fd);
    };

    #endif
The class has a rather extensive interface, but that's mainly the consequence of select() being a complex function. It's members are: Almost all members are so simple, that they can be defined inline. The two exceptions, Selector() and wait() were implemented as follows:
    #include "selector.h"

    Selector::Selector()
    {
        FD_ZERO(&d_read);
        FD_ZERO(&d_write);
        FD_ZERO(&d_except);
    
        noAlarm();
    
        d_max = 0;
    }

    int Selector::wait()
    {
        timeval t = d_alarm;
    
        d_ret_read = d_read;
        d_ret_write = d_write;
        d_ret_except = d_except;
    
        d_readidx = 0;
        d_writeidx = 0;
        d_exceptidx = 0;
    
        d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, &t);
    
        if (d_ret < 0)
            throw "Selector::wait()/select() failed";
        
        return d_ret;
    }


    void Selector::addFd(fd_set *set, int fd)
    {
        FD_SET(fd, set);
        if (fd >= d_max)
            d_max = fd + 1;
    }
    
    int Selector::checkSet(int *index, fd_set &set)
    {
        int &idx = *index;
    
        while (idx < d_max && !FD_ISSET(idx, &set))
            ++idx;
    
        return idx == d_max ? -1 : idx++;
    }

19.4.4.2: The `Child' class

The child-processes started by the monitor are defined by the class Child. The Child class is derived from the class Fork, so childProcess() and parentProcess() members must be defined. However, in this case the parent process is empty, as all communications with the child process take place in the Monitor object (see section 19.4.4.3).

The Child object will set up redirections in such a way that it can read its standard input and output in the child process itself, and will make the opposites ends of the corresponding pipes known to the parent process via two auxiliary functions, getReadFd() and getWriteFd().

Note that the reading end of a pipe in the child process corresponds to the writing end of a pipe in the paren process (and vice versa). The Child object is responsible for the bookkeeping that's involved here: the user of the Child process shouldn't have to juggle with the opposite ends of pipes, but should be given the appropriate ends, including their functions. Hence, getReadFd(), which is used by the parent process, will produce the file descriptor from which the parent may read information sent to it by its child.

Apart from this noteworthy characteristic, Child's interface doesn't contain special features. Here it is:

    #include "pipe.h"
    #include "fork.h"

    #ifndef _INCLUDED_SELECTOR_H_
    #include "selector.h"
    #endif

    class Child: public Fork
    {
        Pipe                d_in;
        Pipe                d_out;
        Selector            d_selector;
    
        int         d_parentReadFd;
        int         d_parentWriteFd;
        int         d_nr;
    
        public:
            Child(int nr)
            :
                d_nr(nr)
            {}

            int getReadFd()
            {
                return d_parentReadFd;
            }
            int getWriteFd()
            {
                return d_parentWriteFd;
            }

            int getPid()
            {
                return Fork::getPid();
            }
        
            int getNr()
            {
                return d_nr;
            }
    
            virtual void childRedirections();
            virtual void parentRedirections();
            virtual void childProcess();
            virtual void parentProcess()
            {}
    };

The implementation of the member functions is given below. Please pay some attention to the way the ends of the two pipes are used: in childRedirections() the redirection to the standard streams takes place, in parentRedirections() the d_out pipe's reading file descriptor is assigned to d_parentReadFd, which is then returned by getReadFd(). The analogous situation holds true for the d_in pipe.

Next, the childProcess() will add the STDIN_FILENO to the d_selector object's set of read file descriptors. Since redirection has taken place, the child process can simply concentrate on the redirected file descriptor.

Then, in a perpetual loop the child process waits for the Selector object to return from its wait() call. When the alarm goes off, it sends a message to its standard output (hence, into the writing pipe) that it's still standing by, otherwise it will echo the messages appearing at its standard input to its standard output.

Here is the implementation of the Child members:

    #include "child.h"
    #include <iostream>
    #include <string>

    using namespace std;

    void Child::childRedirections()
    {
        d_in.useForReadingFrom(STDIN_FILENO);
        d_out.useForWritingTo(STDOUT_FILENO);
    }

    void Child::parentRedirections()
    {
        d_parentReadFd = d_out.useForReading();
        d_parentWriteFd = d_in.useForWriting();
    }

    void Child::childProcess()
    {
        Selector    selector;
        unsigned    message = 0;
        
        selector.addReadFd(STDIN_FILENO);
        selector.setAlarm(5);

        while (true)
        {
            try
            {
                if (!selector.wait())       // timeout
                    cout << "Child " << d_nr << ": standing by\n";
                else
                {
                    string  line;
                    getline(cin, line);
                    cout << "Child " << d_nr << ":" << ++message << ": " << 
                                                        line << endl;
                }
            }
            catch (...)
            {
                    cout << "Child " << d_nr << ":" << ++message << ": " << 
                                "select() failed" << endl;
            }
        }
        exit(0);
    }

19.4.4.3: The `Monitor' class

The Monitor object does all the work. The public interface of the class MonitorIts public interface is rather simple, as it just contains the function to start the whole process. Here is the interface of the class Monitor. Note that it contains a nested class Find. This nested class is discussed below, at the description of the stopChild() member function:

    #include <string>
    #include <map>

    #ifndef _INCLUDED_SELECTOR_H_
    #include "selector.h"
    #endif

    #include "child.h"

    class Monitor
    {
        enum Commands
        {
            UNKNOWN,
            START,
            EXIT,
            STOP,
            TEXT
        };

        Selector                d_selector;
        int                     d_nr;
        std::map<int, Child *>  d_child;

        public:
            Monitor()
            :
                d_nr(0)
            {}

            void run();

        private:
            static void waitForChild(int signum);
            static void killChild(std::map<int, Child *>::value_type it);

            int     next(int &value, std::string &line);
            void    processInput();
            void    processChild(int fd);
            void    exiting();
            void    stopChild(int value);
            void    sendChild(int value, std::string const &line);
            void    createNewChild();

            class Find
            {
                int     d_nr;
        
                public:
                    Find(int nr)
                    :
                        d_nr(nr)
                    {}
                    bool operator()(std::map<int, Child *>::value_type &vt) 
                                                                        const
                    {
                        return d_nr == vt.second->getNr();
                    }
            };
    };
Apart from the constructor, there is just one public member: run(). This member does the actual child-handling, as follows: The member run() was implemented as follows:
    #include "monitor.ih"

    void Monitor::run()
    {
        signal(SIGCHLD, waitForChild);

        d_selector.addReadFd(STDIN_FILENO);

        while (true)
        {
            cout << "? " << flush;
            try
            {
                d_selector.wait();

                int fd;

                while ((fd = d_selector.getReadFd()) != -1)
                {
                    if (fd == STDIN_FILENO)
                        processInput();
                    else
                        processChild(fd);
                }
            }
            catch (...)
            {
                cerr << "select failed, exiting\n";
                exiting();
            }
        }
    }

The member processInput() reads the commands from the user via the standard input. The member itself is rather simple: it calls next() to obtain the next command entered by the user, and then calls the corresponding function. This member and its helper member are not complicated. The sources, which were kept rather basic, will hardly need further elaboration at this point in the Annotations:

    #include "monitor.ih"

    void Monitor::processInput()
    {
        string line;
        int    value;

        switch (next(value, line))
        {
            case START: 
                createNewChild();
            break;    
    
            case EXIT: 
                exiting();
            break;    
    
            case STOP: 
                stopChild(value);
            break;    
    
            case TEXT: 
                sendChild(value, line);
            break;    
    
            default:
                cout << "unknown: " << line << "\n";
            break;
        }
    }

    int Monitor::next(int &value, string &line)
    {
        if (!getline(cin, line))
            throw "Command::next(): reading cin failed";
    
        if (line == "start")
            return START;

        if (line == "exit")
            return EXIT;

        if (line.find("stop") == 0)
        {
            istringstream istr(line.substr(4));
            istr >> value;
            return !istr ? UNKNOWN : STOP;
        }

        istringstream istr(line.c_str());
        istr >> value;
        if (istr)
        {
            getline(istr, line);
            return TEXT;
        }

        return UNKNOWN;
    }

All remaining input sensed by select() originated in a child process. The processChild() member is called with the appropriate file descriptor. Then, using a ifdstreambuf (see section 19.1.2.1), the input appearing at the file descriptor is read from an input stream. The communication protocol that is used here is also rather basic: To every line of input sent to a child, the child sends exactly one line of text in return. Consequently, processChild() can just read the line of text, and return:

    #include "monitor.ih"

    void Monitor::processChild(int fd)
    {
        ifdstreambuf ifdbuf(fd);
        istream istr(&ifdbuf);
        
        string line;
        getline(istr, line);
        cout << d_child[fd]->getPid() << ": " << line << endl;
    }
Please note the construction d_child[fd]->getPid() that is used in the above source. Looking at the interface of Monitor, a map<int, Child *> is observed. This map contains the input file descriptor as its key, and a pointer to the Child object as its value. A pointer is used here, rather than a Child object, since we do want to use the facilities offered by the map, but don't want to copy a Child object.

The responsibilities to destroy the Child object once it becomes superfluous is now, of course, with the programmer, and not any more with the run-time support system.

A new element is added to the map by the member createNewChild(), which is called in response of the user entering the start command. Here is its definition:

    #include "monitor.ih"

    void Monitor::processChild(int fd)
    {
        ifdstreambuf ifdbuf(fd);
        istream istr(&ifdbuf);
        
        string line;
        getline(istr, line);
        cout << d_child[fd]->getPid() << ": " << line << endl;
    }
Here the new processes are started. Once the Child object is created, its fork() member is called. This will (as fork() uses the template method design pattern, see section 19.4) call the child- and parent- related functions in, respectively, the child- and parent-processes. As the child process never returns (see section 19.4.4.2), and Child::parentProcess() is an empty function, the code below the cp->fork() call is only executed in the parent process, which is the main program. This code simply adds the file descriptor that's available for reading information from the child to the set of input file descriptors monitored by select(), and stores the pointer to the Child object in the d_child map.

Direct communication with the child is required as the result of the stop <nr> and <nr> text commands, which will, respectively, terminate child process <nr> and send the text text to child process <nr>. In both cases, the program has to figure out which process belongs to a certain child number. Here the find_if() generic algorithm (see section 17.4.14) was used to look for the requested child-number in the d_child map. The definition of the class Find could be part of the internal header file monitor.ih, or, alternatively, it could be defined as a nested private class in Monitor's class interface. Here the former approach was followed: Find is defined as a nested class of Monitor.

The stopChild() member looks for the requested child process number and, if found, removes its input file descriptor from the set monitored by the select() system call. Then the child process is terminated by calling killChild(), whereafter the Child object is erased from the map. Note that this erasure does not destroy the Child object, so before erasing the map-element, the Child object is deleted first:

    #include "monitor.ih"

    void Monitor::stopChild(int nr)
    {
        map<int, Child *>::iterator it =
            find_if(d_child.begin(), d_child.end(), Find(nr));

        if (it != d_child.end())
        {
            d_selector.rmReadFd(it->second->getReadFd());

            killChild(*it);

            delete it->second;
            d_child.erase(it);        
        }
        else
            cerr << "No child number " << nr << endl;
    }

Comparably, the sendChild() member sends some information to a particular child process. The same lookup action is required here as with stopChild(): using a Monitor::Find object the validity of the requested child process number is obtained, whereafter a ofdnstreambuf object is constructed (see section 19.1.1) to be used with an ostream object, to send information to the child.

Three member functions remain to be discussed:

Here are their sources:
    #include "monitor.ih"

    void Monitor::waitForChild(int signum)
    {
        int status;
        wait(&status);
        
        signal(SIGCHLD, waitForChild);
    }

    void Monitor::killChild(map<int, Child *>::value_type it)
    {
        if (kill(it.second->getPid(), SIGTERM))
            cerr << "Couldn't kill process " << it.second->getPid() << endl;
    }

    void Monitor::exiting()
    {
        for_each(d_child.begin(), d_child.end(), killChild);
        exit(0);
    }

Finally, the program's main() function is simply:

    #include "monitor.h"

    int main()
    {
        Monitor monitor;

        monitor.run();
    }

    /*
        Example of a session:

    # a.out
    ? start
    Child 1 started
    ? 1 hello world
    ? 3394: Child 1:1:  hello world
    ? 1 hi there!
    ? 3394: Child 1:2:  hi there!
    ? start
    Child 2 started
    ? 3394: Child 1: standing by
    ? 3395: Child 2: standing by
    ? 3394: Child 1: standing by
    ? s3395: Child 2: standing by
    ? top 1
    ? 3395: Child 2: standing by
    ? 2 hello world
    ? 3395: Child 2:1:  hello world
    ? 1 hello world
    No child number 1
    ? exit3395: Child 2: standing by
    ? 
    #     
    */

19.5: Function objects performing bitwise operations

In section 17.1 several types of predefined function objects were introduced. Predefined function objects exists performing arithmetic operations, relational operations, and logical operations exist, corresponding to a multitude of binary- and unary operator unary operators.

For whatever reason, some operators are missing: there are no predefined function objects corresponding to bitwise operations. However, their construction is, given the available predefined function objects, not difficult. The following examples show a template class implementing a function object calling the bitwise and ( operator&()), and a template class implementing a function object calling the unary not ( operator~()). Other missing operators can be constructed similarly. This is, however, left as an exercise to the reader.

Here is the implementation of a function object calling the bitwise operator&():

    #include <functional>

    template <typename _Tp>
    struct bit_and : public std::binary_function<_Tp,_Tp,_Tp> 
    {
        _Tp operator()(const _Tp& __x, const _Tp& __y) const 
        { 
            return __x & __y; 
        }
    };

Here is the implementation of a function object calling operator~():

    #include <functional>

    template <typename _Tp>
    struct bit_not : std::public unary_function<_Tp,_Tp> 
    {
        _Tp operator()(const _Tp& __x) const 
        { 
            return ~__x; 
        }
    };

These and other missing predefined function objects are also implemented in the file bitfunctional, which is found in one of the cplusplus zip-archives.

An example using bit_and() to remove all odd numbers from a vector of int values, using the remove_if() generic algorithm is:

    #include <iostream>
    #include <algorithm>
    #include <vector>
    #include "bitand.h"

    using namespace std;

    int main()
    {
        vector<int>
            vi;
        
        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy
        (
            vi.begin(), 
            remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
            ostream_iterator<int>(cout, " ")
        );
        cout << endl;
    }
    /*
        Generated output:

        0 2 4 6 8
    */

19.6: Implementing a reverse_iterator

In this section we will extend the class Vector, introduced in section 18.3 to contain a RandomAccessIterator reverse_iterator.

First, we note that applying the ordinary iterators of the Vector class (made available by the begin() and end() member functions) to the generic algorithms is without any problem, as the pointers that are returned by these members are accepted by the generic algorithms. Here is a Vector template class:

    #include <algorithm>

    template <typename Type>
    class Vector
    {
        typedef Type *iterator;
    
        public:
            Vector()
            {
                init(0);
            };
            Vector(unsigned n)
            {
                init(n);
            }
            Vector(Vector<Type> const &other)
            {
                construct(other);
            }
            ~Vector()
            {
                delete [] data;
            }            
            Vector<Type> const &operator=(Vector<Type> const &other)
            {
                if (this != &other)
                {
                    delete [] data;
                    construct(other);
                }
                return *this;
            }
            Type &operator[](unsigned index) throw(char const *)
            {
                if (index >= (beyond - data))
                    throw "Vector array index out of bounds";
                return data[index];
            }

            void push_back(Type const &value)
            {
                if (!data)
                {
                    init(1);
                    beyond = data;
                }
                else if (beyond == end_of_storage)
                {
                    Vector<Type>
                        enlarged((end_of_storage - data) << 1);
                    copy(data, beyond, enlarged.data);
                    delete [] data;
                    beyond = enlarged.data + (beyond - data);
                    data = enlarged.data;
                    end_of_storage = enlarged.end_of_storage;
                    enlarged.data = 0;
                }
                *beyond++ = value;
            }
            iterator begin()
            {
                return data;
            }
            iterator end()
            {
                return beyond;
            }
            unsigned size()
            {
                return beyond - data;
            }
        private:
            void init(unsigned n)
            {
                if (n)
                {
                    data = new Type[n];
                    beyond = data + n;
                    end_of_storage = data + n;
                }
                else
                {
                    data = 0;
                    beyond = 0;
                    end_of_storage = 0;
                }
            }                    
            void construct(Vector<Type> const &other)
            {
                init(other.beyond - other.data);
                copy(other.data, other.beyond, data);
            }

            Type
                *data,
                *beyond,
                *end_of_storage;
    };

And a simple program using Vector:

    #include <iostream>
    #include "vector.h"

    using namespace std;
    
    int main()
    {
        Vector<int>
            vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        random_shuffle(vi.begin(), vi.end());

        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;
    }
    /*
        generated output, e.g.,

        1 7 9 3 5 6 0 4 8 2 
    */

However, the use of a reverse iterator is not without problems: the reverse iterator shows rather complex behavior, in that it visits the elements in its range backwards, resulting in a reversal of the meanings of the operators that are used to step forward and step backward in a range. If a reverse iterator rit points to element 11 in a range, then it will point to element 10, rather than 12 after ++rit. Because of this complex behavior, the reverse iterator will be implemented as a class reverse_iterator, which can do the required bookkeeping for us.

Furthermore, in the current context reverse_iterator is a reverse iterator only for the elements of the class Vector. So, a nested class seems to be appropriate here. Since Vector has already been defined, we will decide to leave it untouched.

Rather than modifying Vector, a template class Vector2 is derived of it, and Vector2 will be given its nested class reverse_iterator. This latter class can be derived from std::random_access_iterator to realize a RandomAccessIterator for the class Vector2:

    template <typename Type>
    class Vector2: public Vector<Type>
    {
        public:
            class reverse_iterator: 
                public std::random_access_iterator<Type, ptrdiff_t>
            {
                friend class Vector2<Type>;     // see the text, below
                ...
            };
        ...
    };
The ptrdiff_t type that is mentioned here is ordinarilly defined as an int (e.g., with the Linux operating system it is defined as a __kernel_ptrdiff_t in types.h, which is defined as an int in posix_types.h).

The reverse_iterator class is now defined as a nested template class, derived from the template class random_access_iterator. Its surrounding class is Vector2, which itself is a template class that was derived from the template class Vector:

    #include <iterator>
    #include "vector.h"

    template <typename Type>
    class Vector2: public Vector<Type>
    {
        public:
            Vector2()
            :
                Vector<Type>()
            {}
            Vector2(unsigned n)
            :
                Vector<Type>(n)
            {}

            class reverse_iterator: 
                        public std::random_access_iterator<Type, ptrdiff_t> 
            {
                friend class Vector2<Type>;

                reverse_iterator(Type *begin, Type *end);

                Type *d_begin;
                Type *d_end;
    
                public:
                    reverse_iterator(reverse_iterator const &other);

//                  reverse_iterator &operator=(reverse_iterator const &other);

                    Type &operator*() const;

                    bool operator<(reverse_iterator const &other) const;
                    bool operator==(reverse_iterator const &other) const;
                    bool operator!=(reverse_iterator const &other) const
                    {
                        return !(*this == other);
                    }    
                    reverse_iterator &operator++();
                    reverse_iterator operator++(int);
                    reverse_iterator &operator--();

                    reverse_iterator operator--(int);
                    int operator-(reverse_iterator const &other) const;
                    reverse_iterator operator+(int step) const;
                    reverse_iterator operator-(int step) const;
            };
    
            reverse_iterator rbegin()
            {
                return reverse_iterator(begin(), end());
            }
            reverse_iterator rend()
            {
                return reverse_iterator(begin(), begin());
            }
    };                

Within reverse_iterator we store two Type pointers, called begin and end, which will be used to define the reversed range [end - 1, begin - 1). In the private section of reverse_iterator we declare:

    Type *d_begin;
    Type *d_end;
The members d_begin and d_end receive their initial values from the begin() and end() members of the Vector object.

However, the class reverse_iterator is closely tied to Vector2: only Vector2 objects should be allowed to construct their own reverse iterators. This policy is enforced by making Vector2<Type> a bound friend of reverse_iterator, and by declaring its constructor private. The constructor expects the two Type pointers, defining the range [begin(), end()). Internally begin() - 1 will be used as endpoint of the reversed range, while end() - 1 will be used as the beginning of the reversed range.

The constructor, defined in the private section of the reverse_iterator class, is therefore:

    reverse_iterator(Type *begin, Type *end)
    :
        d_begin(begin),
        d_end(end)
    {}
In the implementation of reverse_iterator, d_end will gradually walk towards d_begin, which will keep its value. The dereferencing operator will not return *d_begin, but rather d_end[-1]: the element just before the element to which d_end points. All this is allowed, as Vector::begin() and Vector::end() return random access iterators themselves, for which pointer arithmetic operations are defined.

Since the reverse_iterator class declares Vector2<Type> as a friend, the Vector2 class can implement two members, rbegin() and rend() returning reverse_iterator objects:

    reverse_iterator rbegin()
    {
        return reverse_iterator(begin(), end());
    }
    reverse_iterator rend()
    {
        return reverse_iterator(begin(), begin());
    }
Apart from these two members, the Vector2 class could define two constructors, analogous to the constructors in the Vector class. E.g.,
    Vector2()
    :
        Vector<Type>()
    {}

In order to allow programs to use these reverse iterators, a public copy constructor and overloaded assignment operator is needed in the class reverse_iterator.

Since the Type pointers in reverse_iterator are merely used for pointing to existing information, a destructor is not required. Of course, this implies that the usefulness of a reverse_iterator terminates if the object from which it was obtained goed out of scope. It is the responsibility of the programmer to make sure this doesn't cause disaster. In the public section of reverse_iterator we add the following members (implementations given as inline code):

    reverse_iterator(reverse_iterator const &other)
    :
        d_begin(other.d_begin),
        d_end(other.d_end)
    {}

    reverse_iterator &operator=(reverse_iterator const &other)
    {
        d_begin(other.d_begin);
        d_end(other.d_end);
    }
The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.

Random access iterators require us to implement a series of member functions. Here they are, as they should appear in the public section of the reverse_iterator nested class. Most implementations will be self-explanatory, and consist of single lines of code:

    #include "vector2.h"

    #define VectorRevIt Vector2<Type>::reverse_iterator

    template <typename Type>
    VectorRevIt::reverse_iterator(reverse_iterator const &other)
    :
        d_begin(other.d_begin),
        d_end(other.d_end)
    {}    

    template <typename Type>
    VectorRevIt::reverse_iterator(Type *begin, Type *end)
    :
        d_begin(begin),
        d_end(end)
    {}    

    template <typename Type>
    Type &VectorRevIt::operator*() const
    {
        return d_end[-1];
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator VectorRevIt::operator++(int)
    {
        return reverse_iterator(d_begin, d_end--);
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator &VectorRevIt::operator++()
    {
        --d_end;
        return *this;
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator &VectorRevIt::operator--()
    {
        ++d_end;
        return *this;
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator VectorRevIt::operator--(int)
    {
        return reverse_iterator(d_begin, d_end++);
    }

    template <typename Type>
    bool VectorRevIt::operator<(reverse_iterator const &other) const
    {
        return d_end > other.d_end;
    }

    template <typename Type>
    bool VectorRevIt::operator==(reverse_iterator const &other) const
    {
        return other.d_end == d_end;
    }

    template <typename Type>
    int VectorRevIt::operator-(reverse_iterator const &other) const
    {
        return other.d_end - d_end;
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator VectorRevIt::operator+(int step) const
    {
        return reverse_iterator(d_begin, d_end - step);
    }

    template <typename Type>
    Vector2<Type>::reverse_iterator VectorRevIt::operator-(int step) const
    {
        return operator+(-step);
    }
The following member deserve special attention: For illustration purposes we show how to implement some of these member functions outside of the template class interface:
    template <typename Type>                // this is a template function
                                            // the class in which the 
    Vector2<Type>::reverse_iterator<Type>:: // constructor is defined
    reverse_iterator(Type *begin, Type *end)// the constructor itself
    :
        d_begin(begin),
        d_end(end)
    {}    

    template <typename Type>                // this is a template function
    Vector2<Type>::reverse_iterator         // this is its return type
                                            // the class in which the funcion
    Vector2<Type>::reverse_iterator<Type>:: // is defined
    operator+(int step) const               // this is the function itself
    {
        return reverse_iterator(d_begin, d_end - step);
    }
Finally, an example of a program using the reversed_iterator:
    #include <iostream>
    #include "vector3.h"

    using namespace std;
    
    int main()
    {
        Vector2<int>
            vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        random_shuffle(vi.rbegin(), vi.rend());
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        sort(vi.rbegin(), vi.rend());          
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        reverse(vi.rbegin(), vi.rend());
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " "));
        cout << endl;
    }

    /*
        Generated output:
    0 1 2 3 4 5 6 7 8 9 
    7 1 5 9 3 4 6 0 2 8 
    9 8 7 6 5 4 3 2 1 0 
    0 1 2 3 4 5 6 7 8 9 
    9 8 7 6 5 4 3 2 1 0 
    */

19.7: Using Bison and Flex

The example discussed in this section digs into the peculiarities of using a parser- and scanner generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator to creat the code which does the actual input recognition.

The current example assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, scanner- and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from ftp://prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used.

For bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself.

Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become somewhat cleaner than without using the class interface.

Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the information. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

19.7.1: Using Flex++ to create a scanner

In this example a lexical scanner is used to monitor the production of a file from several subfiles. This example focuses on the lexical scanner, and on switching files while churning through the information.

The setup is as follows: The input-language knows of an #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

In order to avoid complexities that have irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program terminates using a proper error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.

One small extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

The program is created in the following steps:

19.7.1.1: The flex++ specification file

The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ creates a class ( yyFlexLexer) from which the class Scanner will be derived.

The code associated with the regular expression rules will be located inside the class yyFlexLexer. However, it would be handy to access the member functions of the derived class within that code.

Fortunately, inheritance helps us to realize this. In the specification of the class yyFlexLexer(), we notice that the function yylex() is a virtual function. In the FlexLexer.h header file we see virtual int yylex():

    class yyFlexLexer: public FlexLexer 
    {
        public:
            yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
    
            virtual ~yyFlexLexer();
    
            void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
            struct yy_buffer_state* yy_create_buffer( istream* s, int size );
            void yy_delete_buffer( struct yy_buffer_state* b );
            void yyrestart( istream* s );
    
            virtual int yylex();

            virtual void switch_streams( istream* new_in, ostream* new_out );
    };
Consequently, if yylex() is defined in a derived class, then this function of the derived class will be called from a base class (i.e., yyFlexLexer) pointer. Since the yylex() function of the derived class is called, that function will have access to the members of its class, and to the public and protected members of its base class.

The context in which the generated scanner is placed is (by default) the function yyFlexLexer::yylex(). However, this context can be changed by defining the YY_DECL-macro. This macro, if defined, determines the context in which the generated scanner will be placed. So, in order to make the generated scanner part of the derived class function yylex(), three things must be done:

The definition of the YY_DECL macro and the yyFlexLexer::yylex() function can conveniently be placed in the lexer specification file, as shown below.

Looking at the regular expressions themselves, notice that we'll need rules for the recognition of the comment, for the #include directive, and for the remaining characters. This is all fairly standard practice. When an #include directive is detected, the directive is parsed by the scanner. This too is common practice. Here is what will happen in the current setup:

The lexical scanner specification file has three sections:

In the current example, the main task of the lexer is to copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used. The complete and annotated lexical scanner specification file to be processed by flex++ is:
%{
/* ----------------------------------------------------------------------------
                                 C++ -preamble.
   Include header files, other than those generated by flex++ and bison++.
      E.g., include the interface to the class derived from yyFlexLexer
----------------------------------------------------------------------------*/

                            // the yylex() function that's actually
                            // used
#define YY_DECL int Scanner::yylex()

#include "scanner.ih"        // The interface of the derived class

int yyFlexLexer::yylex()    // not called: overruled by
{                           // Scanner::yylex()
    return 0;
}
  
%}

/* ----------------------------------------------------------------------------
                              Flex++ symbol area
                              ~~~~~~~~~~~~~~~~~~
      The symbols mentioned here are used for defining e.g., a mini scanner
---------------------------------------------------------------------------- */
%x      comment 
%x      include
%option yylineno

eolnComment     "//".*
anyChar         .|\n

/* ----------------------------------------------------------------------------
                               Flex rules area:
                               ~~~~~~~~~~~~~~~~
     Regular expressions below here define what the lexer will recognize.
---------------------------------------------------------------------------- */
%%
    /*
        The comment-rules: comment lines are ignored.    
    */
{eolnComment}
"/*"                    BEGIN comment;
<comment>{anyChar}
<comment>"*/"           BEGIN INITIAL;

    /*                
        File switching: #include <filepath>
    */
#include[ \t]+"<"       BEGIN include;
<include>[^ \t>]+       nextSource = yytext;
<include>">"[ \t]*\n    {
                            BEGIN INITIAL;
                            performSwitch();
                        }
<include>{anyChar}      throw invalidInclude;

    /* 
        The default rules: eating all the rest, echoing it to output    
    */ 
{anyChar}               ECHO;

    /*
        The <<EOF>>)rule: pop a pushed file, or terminate the lexer
    */
<<EOF>>                 {
                            if (!popSource())
                                yyterminate();
                        }
Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), most processing can be done by the derived class' member functions. This results in a very clean setup of the lexer specification file, which requires hardly any code in the preamble.

19.7.1.2: The derived class: Scanner

The class Scanner is derived from the class yyFlexLexer as generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

Other members are available as well, but we feel they are less often used. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

The class Scanner performs two tasks:

Several member functions are used to accomplish these tasks. As they are auxiliary to the scanner, they are private members. In practice, these private members are developed once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized by the Scanner() constructor and are used in the private member functions. They are discussed below, in the context of the member functions that are using them. Here is the scanner.h header file:
#include <FlexLexer.h>  // provides yyFlexLexer interface
#include <fstream>
#include <string>
#include <stack>
#include <vector>

class Scanner: public yyFlexLexer
{
    public:         
        enum Error
        {
            invalidInclude,
            circularInclusion,
            nestingTooDeep,
            cantRead,
        };
                
        Scanner(istream *yyin, string const &initialName);
        string const &lastFile()
        {
            return fileName.back();
        }
        void stackTrace();  // dumps filename stack to cerr
        int yylex();        // overruling yyFlexLexer's yylex()
    private:
        Scanner(Scanner const &other);      // no Scanner copy-initialization
        Scanner &operator=(Scanner const &other);   // no assignment either

        bool popSource();   // true: source popped
        void performSwitch();
        void throwOnCircularInclusion();

        stack<yy_buffer_state *>
            state;
        vector<string>
            fileName;
        string
            nextSource;

        static unsigned const
            sizeof_buffer = 16384,
            maxDepth = 10;
};
The scanning process proceeds as follows: Once the scanner extracts a filename from an #include directive, a switch to another file is performed by performSwitch(). If the filename could not be extracted, the scanner throws an invalidInclude exception value.

The performSwitch() member and the matching function popSource() handle the file switch. In particular, note that yylineno is not updated when a file switch is performed. If line numbers are to be monitored, then the current value of yylineno should be pushed on a stack, and yylineno should be reset by performSwitch(), while popSource() should reinstate a former value of yylineno by popping a previously pushed value from the stack.

In the current implementation of Scanner a simple stack of yy_buffer_state pointers is maintained. Changing that into a stack of pair<yy_buffer_state *, unsigned> elements allows you to save the line numbers. This modification is left as an exercise to the reader.

As mentioned. performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of member functions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on the state stack when an #include is encountered. Then this buffer is replaced by the buffer that will be created for the file that is mentioned in the #include directive. The actual file switching is realized as follows:

The sources for the member functions performSwitch() and throwOnCircularInclusion() are:
#include "scanner.ih"

void Scanner::performSwitch()
{   
    if (state.size() == maxDepth)
        throw nestingTooDeep;

    fileName.push_back(nextSource);
    throwOnCircularInclusion();
    
    ifstream *newStream = new ifstream(nextSource.c_str());
        
    if (!*newStream)
        throw cantRead;

    state.push(yy_current_buffer);
    yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer));
}
#include "scanner.ih"

void Scanner::throwOnCircularInclusion()
{   
    vector<string>::iterator
        it = find(fileName.begin(), fileName.end(), nextSource);

    if (it != fileName.end())
        throw circularInclusion;
}
The member function popSource() is called to pop the previously pushed buffer from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects the size of the state stack: if empty, false is returned and the fuction terminates. if not empty, then the current buffer is deleted, to be replaced by the state waiting on top of the stack. The file switch is performed by the yyFlexLexer members yy_delete_buffer() and yy_switch_to_buffer. Note that yy_delete_buffer() takes care of the closing of the ifstream and of deleting the memory allocated for this stream in performSwitch(). Furthermore, the filename that was last entered in the fileName vector is removed. Having done all this, the function returns true:
#include "scanner.ih"

bool Scanner::popSource()
{       
    if (state.empty())
        return false;

    yy_delete_buffer(yy_current_buffer);
    yy_switch_to_buffer(state.top());
    state.pop();
    fileName.pop_back();
    return true;
}
These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the Scanner::yylex() function. The Scanner object itself only has three public member functions and a constructor. One member function is yylex(), the other two allow the called to dump a stack trace and to obtain the name of the file that was last processed by the scanner. The constructor is a simple piece of code. Here is its source:
#include "scanner.h"

Scanner::Scanner(istream *yyin, string const &initialName)
{
    switch_streams(yyin, yyout);
    fileName.push_back(initialName);
}

19.7.1.3: The main() function

The main program is very simple. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is constructed, receiving the address of the ifstream object and the name of the file as its arguments. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer. If anything fails inside the scanner, an exception is thrown, which is caught at the end of the main() function:

/*                              lexer.cc

   A C++ main()-frame generated by C++ for lexer.cc

*/

#include "lexer.h"           /* program header file */

int main(int argc, char **argv)
{       
    if (argc == 1)
    {
        cerr << "Filename argument required\n";
        exit (1);
    }

    ifstream yyin(argv[1]);

    if (!yyin)
    {
        cerr << "Can't read " << argv[1] << endl;
        exit(1);
    }
     
    Scanner scanner(&yyin, argv[1]);

    try
    {
        return scanner.yylex();
    }
    catch (Scanner::Error err)
    {
        char const *msg[] =
        {
            "Include specification",
            "Circular Include",
            "Nesting",
            "Read",
        };

        cerr << msg[err] << " error in " << scanner.lastFile() << 
                            ", line " << scanner.lineno() << endl;
        scanner.stackTrace();
        return 1;
    }
}

19.7.1.4: Building the scanner-program

The final program is constructed in two steps. These steps are given for a Unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

For the purpose of debugging a lexical scanner, the rules that are matched and the tokens that are returned provide useful information. When flex++ is called with the -d flag, debugging code will be part of the generated scanner. Apart from that, the debugging code must be activated. Assuming the scanner object is called scanner, the statement
scanner.set_debug(1);

must be given after the construction of the scanner object.

19.7.2: Using both bison++ and flex++

When an input language exceeds a certain level of complexity, a parser is generally needed to be able to control the complexity of the input language. In these cases, a parser generator is used to generate the code that is required to determine the grammatical correctness of the input. The function of the lexical scanner is to provided chunks of the input, called tokens, for the parser to work with.

Starting point for a program that uses both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer).

Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by member functions, which are called from within the parser's rules or are associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex++. In the current section we concern ourselves with the parser. The parser can be generated from a grammar specification to be processed by the program bison++. The grammar specification required for bison++ is similar to the specifications required for bison, but bison++ will generate a class, rather than a single function.

In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written behind their operands. Comparably, the unary operator - will be converted from its prefix notation to a postfix form (We ignore the unary + operator here for the sake of brevity).

Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section is, after all, to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

In the upcoming sections we'll develop the grammar specification for bison++. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.

19.7.2.1: The bison++ specification file

The grammar specification file used for bison++ is comparable to the specification file used for bison. Differences are related to the class nature of the resulting parser. Our calculator will distinguish real numbers from integers, and will support the basic set of arithmetic operators.

The bison++ specification file consists of the following sections:

19.7.2.2: The bison++ token section

The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:

19.7.2.3: The bison++ grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.

    lines:
        lines
        line
    |
        line
    ;
    
    line:
        intExpr
        '\n'
        {
            cerr << "int: " << $1 << endl;
        }
    |
        doubleExpr
        '\n'
        {
            cerr << "double: " << $1 << endl;
        }
    |
        '\n'
        {
            cout << "Good bye\n";
            YYACCEPT;
        }
    |
        error
        '\n'
    ;
    
    intExpr:
        intExpr '*' intExpr
        {
            $$ = $1 * $3;
        }
    |
        intExpr '+' intExpr
        {
            $$ = $1 + $3;
        }
    |
        '(' intExpr ')'
        {
            $$ = $2;
        }
    |
        '-' intExpr         %prec UnaryMinus
        {
            $$ = -$2;
        }
    |
        INT
        {
            $$ = atoi(lexer.YYText());
        }
    ;
    
    doubleExpr:
        doubleExpr '*' doubleExpr
        {
            $$ = $1 * $3;
        }
    |
        doubleExpr '+' doubleExpr
        {
            $$ = $1 + $3;
        }
    |
        doubleExpr '*' intExpr
        {
            $$ = $1 * $3;
        }
    |
        doubleExpr '+' intExpr
        {
            $$ = $1 + $3;
        }
    |
        intExpr '*' doubleExpr
        {
            $$ = $1 * $3;
        }
    |
        intExpr '+' doubleExpr
        {
            $$ = $1 + $3;
        }
    |
        '(' doubleExpr ')'
        {
            $$ = $2;
        }
    |
        '-' doubleExpr         %prec UnaryMinus
        {
            $$ = -$2;
        }
    |
        DOUBLE
        {
            $$ = atof(lexer.YYText());
        }
    ;
Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is:
    %name  Parser                
    %union 
    {
        int i;
        double d;
    };
    %token      INT
                DOUBLE
    %type   <i> intExpr
            <d> doubleExpr
    
    %left   '+'
    %left   '*'
    %right  UnaryMinus
    
    %define MEMBERS                                         \ 
        virtual ~Parser()   {}                              \ 
        private:                                            \ 
            Scanner     lexer;              
    %define LEX_BODY {return lexer.yylex();}
    
    %define ERROR_BODY { cerr << "error encountered\n"; }
In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks complex, since the expression rules must be included for each individual return type. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

Also, note that the lexical scanner is included as a member of the parser. Although there's really no need here to define the scanner as a separate Scanner class, but doing so shows the general approach we're advocating here. In the current situation, the definition of the scanner class is extremely simple:

#include <FlexLexer.h>  // provides yyFlexLexer interface

class Scanner: public yyFlexLexer
{
};
In general, the scanner can well be defined as a (private) member of the parser, as it's not used outside of the parser object. The parser's virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor. E.g.,
    warning: `class Parser' has virtual functions but non-virtual destructor

19.7.2.4: The flex++ specification file

The flex-specification file that is used by our calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE tokens. Here is the complete flex++ specification file:

%{ 
#include "parser.h"
%}
%option yylineno
%%

[ \t]                       ;
[0-9]+                      return(Parser::INT);

"."[0-9]*                   |                                        
[0-9]+("."[0-9]*)?          return(Parser::DOUBLE);

exit                        |
quit                        return (Parser::DONE);

.|\n                        return (*yytext);

19.7.2.5: The generation of the code

The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, give the command:

    bison++ -d -o parser.cc parser
Next, flex++ will generate code on lexer.cc using the command
    flex++ -I -olexer.cc lexer
Note here that flex++ expects no blanks between the -o flag and lexer.cc.

On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:

    g++ -o calc -Wall *.cc -lfl -s
Note the fact that the program is linked against the libfl.a library. If it's not provided, the linker will complain about unresolved references like yywrap().

A source in which the main() function and the parser object (having the lexical scanner as one of its data members) is, finally:

#include "parser.h"

int main()
{
    Parser parser;

    cout << "Enter (nested) expressions containing ints, doubles, *, + and "
            "unary -\n"
            "operators. Enter an empty line, exit or quit to exit.\n";

    return parser.yyparse();
}