We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.Please state the document version you're referring to, as found in the title (in this document: 5.2.0a) and please state the paragraph you're referring to.
All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.
This chapter presents a number of concrete examples of programming in
C++. Items from this document such as virtual functions, static
members, etc. are rediscussed. Examples of container classes are shown.
Also, with the advent of the
ANSI/ISO standard, classes supporting streams
based on
file descriptors are no longer available,
including the
Gnu procbuf extension. These classes were frequently used
in older C++ programs. This section of the C++ Annotations develops an
alternative: classes extending streambuf, allowing the use of file
descriptors, and classes around the
fork()
system call.
Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.
streambuf as the
starting point for the construction of classes interfacing file
descriptors. In this section we will discuss the construction of a class which
may be used to write to a device defined by its file descriptor: it may be a
file, but could be a
pipe or
socket. Section 19.1.2 will discuss
the matter of reading from devices that are known by their file descriptors,
while section 19.3 will reconsider the question of redirection,
which was discussed earlier in section 5.8.3.
Basically, deriving a class for
output operations is very simple. The only
member function that must be overridden is the
virtual int overflow(int
c)
member. This member is responsible for writing
characters to the device in the case where there is no buffer or no more room
in the buffer. If fd represents the file descriptor to write to, and if we
decide against using a buffer, the member overflow() can be as simple as:
class UnbufferedFD: public std::streambuf
{
public:
int overflow(int c)
{
if (c != EOF)
{
if (write(fd, &static_cast<char>(c), 1) != 1)
return EOF;
}
return c;
}
...
}
The reveived argument is either written as char to the file
descriptor, or EOF is returned. However, this approach does not use an
output buffer. As the use of a buffer is strongly advised (see also the next
section), we will discuss the construction of a class using an output buffer
is somewhat greater detail.
When an output buffer is to be used, the overflow() member gets a bit more
complex, due to the fact that it is now called only when the buffer is
full. So, first we will have to flush the buffer, for which the (virtual)
function
streambuf::sync() should be used. Defining sync() and using
it in overflow() is not all that must be done: the information to write
might be less than the size of the buffer. So, at the end of the
lifetime
of our special streambuf object, it might find itself having a buffer that
is only partially full. So, we must make sure that the buffer is flushed by
the time our object goes
out of scope. This is of course very simple:
sync() should be called in the
destructor as well.
Now that we've considered the consequences of using an output buffer,
we're almost ready to construct our derived class. We add a couple of extra
features. First, we permit the specification of the size of the output buffer
by the user of the class. Second, we permit the construction of an object of
the class before the file descriptor to use is actually known. Later, in
section 19.4 we'll encounter a situation where this feature will be used.
In order to save some space, the successful operation of various calls were
not checked. In `real life' implementations these checks should of course not
be omitted. Here is the class ofdnstreambuf:
#ifndef _OFDNSTREAMBUF_H_
#define _OFDNSTREAMBUF_H_
#include <unistd.h>
#include <streambuf>
#include <ios>
class ofdnstreambuf: public std::streambuf
{
unsigned d_bufsize;
int d_fd;
char *d_buffer;
public:
ofdnstreambuf()
:
d_bufsize(0),
d_buffer(0)
{}
ofdnstreambuf(int fd, unsigned bufsize = 1)
{
open(fd, bufsize);
}
~ofdnstreambuf()
{
if (d_buffer)
{
sync(); // 1
delete d_buffer;
}
}
void open(int fd, unsigned bufsize = 1)
{
d_fd = fd;
d_bufsize = bufsize == 0 ? 1 : bufsize;
d_buffer = new char[d_bufsize];
setp(d_buffer, d_buffer + d_bufsize); // 2
}
int sync()
{
if (pptr() > pbase())
{
write(d_fd, d_buffer, pptr() - pbase()); // 3
setp(d_buffer, d_buffer + d_bufsize); // 4
}
return 0;
}
int overflow(int c)
{
sync(); // 5
if (c != EOF)
{
*pptr() = static_cast<char>(c); // 6
pbump(1);
}
return c;
}
};
#endif
sync() is
called to write any unprocessed characters in the output buffer to
the device.
open() member, the buffer is initialized.
Using
setp() the begin and end points
of the buffer are passed to the streambuf base class, allowing it
to setup
pbase()
pptr() and
epptr().
sync(), the
unflushed characters in the buffer are written to the device. Then,
at the next line, the buffer is reinitialized. Note that sync()
should return 0 after a successfull flush operation.
streambuf are
reset to their initial values by called setp() once again.
overflow(),
sync() is called to flush the now filled up output buffer to the
device. As this recreates an empty buffer, the character c which
could not be written to the buffer is now entered into the buffer
using the member functions pptr() and
pbump(). Notice that entering a character into the buffer is
realized using available member functions of streambuf.
ofdstreambuf class to copy its standard input to file descriptor
STDOUT_FILENO, which is the
symbolic name of the file descriptor used
for the standard output. Here is the program:
#include "fdout.h"
#include <string>
#include <iostream>
#include <istream>
using namespace std;
int main(int argc)
{
ofdnstreambuf fds(STDOUT_FILENO, 500);
ostream os(&fds);
switch (argc)
{
case 1:
os << "COPYING cin LINE BY LINE\n";
for (string s; getline(cin, s); )
os << s << endl;
break;
case 2:
os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";
cin >> os.rdbuf(); // Alternatively, use: cin >> &fds;
break;
case 3:
os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
os << cin.rdbuf();
break;
}
}
streambuf, the class should use an input buffer of at least one character,
to allow the use of the member functions
istream::putback() or
istream::ungetc(). Stream classes (like
istream) normally allow the
ungetting of at least one character using their member functions putback()
or ungetc(). This is important, as these stream classes usually interface
to
streambuf objects. Although strictly speaking it is not necessary to
implement a buffer in classes derived from streambuf, we strongly suggest
to use buffers is these cases: the implementation is very simple and
straightforward, and the applicability of such classes is greatly
improved. Therefore, in all our classes
derived from the class streambuf
at least a
buffer of one character will be defined.
19.1.2.1: Using a one-character buffer
When deriving a class (e.g.,
ifdstreambuf) from streambuf using a
buffer of at least one character, at the minimum the member
streambuf::underflow() should be overridden, as this is the member to
which all requests for input eventually degenerate. Since also a buffer is
needed, the member
streambuf::setg() is used to inform the streambuf
baseclass of the dimensions of the input buffer, so that it is able to set up
its input buffer pointers so that
eback(),
gptr(), and
egptr() return correct values. Here is the
corresponding class definition:
#include <unistd.h>
#include <streambuf>
#include <ios>
class ifdstreambuf: public std::streambuf
{
protected: // 1
int d_fd;
char d_buffer[1];
public:
ifdstreambuf(int fd)
:
d_fd(fd)
{
setg(d_buffer, d_buffer + 1, d_buffer + 1); // 2
}
int
underflow()
{
if (gptr() < egptr()) // 3
return *gptr(); // 4
if (read(d_fd, d_buffer, 1) <= 0) // 5
return EOF;
setg(d_buffer, d_buffer, d_buffer + 1); // 6
return *gptr();
}
};
Please note the following:
protected data members so that derived classes (e.g., see section
19.1.2.3) can access them.
private section a small
array of one character is defined, and setg() will set gptr() equal to
egptr(). Since this implies that the buffer is empty, underflow() will
be called to refill the buffer.
underflow() will first see whether the buffer
is really empty. If not, then the character to be retrieved next is returned.
EOF is returned. More sophisticated implementations could
react more intelligently here, of course.
setg() is called once
again to set up streambuf's buffer pointers correctly.
ifdstreambuf class is used in the following program:
#include "ifdbuf.h"
#include <iostream>
#include <istream>
using namespace std;
int main(int argc)
{
ifdstreambuf
fds(0);
istream
is(&fds);
cout << is.rdbuf();
}
19.1.2.2: Using an n-character buffer
How complex would things get if we decide to use a buffer of substantial
size? Not that complex. The following class allows use to specify the size of
a buffer, and it is basically the same class as ifdstreambuf from the
previous section. To make things a bit more interesting, in the current class
ifdnstreambuf the member
streambuf::xsgetn() was also overridden, to
optimize the reading of series of characters at once. Also, a default
constructor is provided which can be used in combination with the added
open() member in cases where we want to construct a istream object
before we know the file descriptor that must be used. Later, in section
19.4 we'll encounter such a situation. In order to save some space, the
successful operation of various calls were not checked. In `real life'
implementations these checks should of course not be omitted. Here is the new
class ifdnstreambuf:
#ifndef _IFDNBUF_H_
#define _IFDNBUF_H_
#include <unistd.h>
#include <streambuf>
#include <ios>
class ifdnstreambuf: public std::streambuf
{
protected:
int d_fd;
unsigned d_bufsize;
char* d_buffer;
public:
ifdnstreambuf()
:
d_bufsize(0),
d_buffer(0)
{}
ifdnstreambuf(int fd, unsigned bufsize = 1)
{
open(fd, bufsize);
}
~ifdnstreambuf()
{
if (d_bufsize)
{
close(d_fd);
delete d_buffer; // 1
}
}
void open(int fd, unsigned bufsize = 1)
{
d_fd = fd;
d_bufsize = bufsize;
d_buffer = new char[d_bufsize];
setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);
}
int underflow()
{
if (gptr() < egptr())
return *gptr();
int nread = read(d_fd, d_buffer, d_bufsize);
if (nread <= 0)
return EOF;
setg(d_buffer, d_buffer, d_buffer + nread); // 2
return *gptr();
}
std::streamsize xsgetn(char *dest, std::streamsize n) // 3
{
int nread = 0;
while (n)
{
if (!in_avail())
{
if (underflow() == EOF) // 4
break;
}
int avail = in_avail();
if (avail > n)
avail = n;
memcpy(dest + nread, gptr(), avail); // 5
gbump(avail); // 6
nread += avail;
n -= avail;
}
return nread;
}
};
#endif
With this class, please note:
ifdnstreambuf object goed
out of scope.
underflow(). The
input buffer pointers of streambuf are now reset in accord with the actual
number of characters read from the device.
xsgetn() is overridden. In a loop
n is reduced until 0, at which point the function
terminates. Alternatively (at 4), the member returns if underflow()
fails to obtain more characters. At 5 and 6 reading is
optimized: instead of calling
streambuf::sbumpc() n times, a block
of avail characters is copied to the destination, using
streambuf::gpumb() to consume avail characters from the buffer using
one function call.
xsgetn() is called by
streambuf::sgetn(),
which is a streambuf member. The following example illustrates the use of
this member function with a ifdnstreambuf object:
#include "ifdnbuf.h"
#include <iostream>
#include <istream>
using namespace std;
int main(int argc)
{
ifdnstreambuf fds(0, 30); // internally: 30 char buffer
char buf[80]; // main() reads blocks of 80
// chars
while (true)
{
unsigned n = fds.sgetn(buf, 80);
if (n == 0)
break;
cout.write(buf, n);
}
}
19.1.2.3: Seeking positions in `streambuf' objects
When devices support seek operations, classes derived from
streambuf
should override
streambuf::seekoff() and
streambuf::seekpos(). In the
following class
ifdseek, which is derived from our previously constructed
class
ifdstreambuf, we assume that the device of which we have a
file descriptor available suports these seeking operations. The class
ifdseek is derived from ifdstreambuf, so it uses a character buffer of
just one character. The facilities to perform seek operations, which are added
to our new class ifdseek will make sure that the input buffer is reset
when a seek operation is requested. The class could also be derived from the
class ifdnstreambuf, in which case the arguments to reset the input
buffer must be adapted such that its second and third parameter point beyond
the available input buffer. Here is the class definition:
#include "ifdbuf.h"
#include <unistd.h>
class ifdseek: public ifdstreambuf
{
typedef std::streambuf::pos_type pos_type; // 1
typedef std::streambuf::off_type off_type;
typedef std::ios::seekdir seekdir;
typedef std::ios::openmode openmode; // 2
public:
ifdseek(int fd) // 3
:
ifdstreambuf(fd)
{}
pos_type seekoff(off_type offset, seekdir dir, openmode) // 4
{
pos_type pos =
lseek
(
d_fd, offset,
(dir == std::ios::beg) ? SEEK_SET :
(dir == std::ios::cur) ? SEEK_CUR :
SEEK_END
);
if (pos < 0)
return -1;
setg(d_buffer, d_buffer + 1, d_buffer + 1); // 5
return pos;
}
pos_type seekpos(pos_type offset, openmode mode) // 6
{
return seekoff(offset, std::ios::beg, mode);
}
};
streambuf and
ios
classes were redefined to prevent us from specifying std::streambuf and
std::ios again and again when reffering to these types.
lseek() is used to seek a new position
in a file whose
file descriptor is known.
streambuf class will
reset its input buffer pointers to an empty buffer, so that underflow()
will refill the buffer at the next request for input.
seekpos is overridden as well:
it is actually defined by a call to seekoff().
ifdseek is the following. If
this program is given its own source file using input
redirection then
seeking is supported, and apart from the first line, every other line is shown
twice:
#include "fdinseek.h"
#include <string>
#include <iostream>
#include <istream>
#include <iomanip.h>
using namespace std;
int main(int argc)
{
ifdseek fds(0);
istream is(&fds);
string s;
while (true)
{
if (!getline(is, s))
break;
streampos pos = is.tellg();
cout << setw(5) << pos << ": `" << s << "'\n";
if (!getline(is, s))
break;
streampos pos2 = is.tellg();
cout << setw(5) << pos2 << ": `" << s << "'\n";
if (!is.seekg(pos))
{
cout << "Seek failed\n";
break;
}
}
}
buffer is refilled
19.1.2.4: Multiple `unget()' calls in `streambuf' objects
As stated earlier
streambuf classes and classes derived from streambuf
should at least support ungetting the character that was last read. Special
care must be taken when series of unget() calls are to be supported. In
this section we will show how to construct a class which is derived from
streambuf and which allows a configurable number of
istream::unget()
or
istream::putback() calls. Support for multiple (say `n')
unget() calls is realized by setting aside an initial section of the input
buffer, which is then gradually filled up by the last n characters read.
Here is an implementation of a class supporting series of unget() calls:
#include <unistd.h>
#include <streambuf>
#include <ios>
#include <algorithm>
class fdunget: public std::streambuf
{
int d_fd;
unsigned d_bufsize;
unsigned d_reserved;
char* d_buffer;
char* d_base;
public:
fdunget (int fd, unsigned bufsz, unsigned unget) // 1
:
d_fd(fd),
d_reserved(unget) // 2
{
unsigned allocate = // 3
bufsz > d_reserved ?
bufsz
:
d_reserved + 1;
d_buffer = new char [allocate]; // 4
d_base = d_buffer + d_reserved; // 5
setg(d_base, d_base, d_base); // 6
d_bufsize = allocate - d_reserved; // 7
}
~fdunget()
{
delete d_buffer;
}
int
underflow()
{
if (gptr() < egptr())
return *gptr();
unsigned consumed = // 8
static_cast<unsigned>(gptr() - eback());
unsigned move = std::min(consumed, d_reserved); // 9
memcpy(d_base - move, egptr() - move, move); // 10
int nread = read(d_fd, d_base, d_bufsize); // 11
if (nread <= 0) // none read ->
return EOF;
setg(d_base - move, d_base, d_base + nread); // 12
return *gptr();
}
};
unget determines the size of a reserved area,
which is the first d_reserved bytes of the used buffer.
reserved. So, a certain number of bytes will
be read, and once reserved bytes have been read at least reserved
bytes can be unget.
d_base, which is located reserved bytes into the d_buffer
area. This is always the point where refills of the buffer start.
streambuf's buffer pointers are set up. As no
characters have been read as yet, all pointers are set to d_base. At the
next read request,
underflow() will now know
that no characters are available, so unget() will fail immediately.
underflow(). Up to this
point everything is standard, and now we determine how many characters were
consumed. Remember that at 6 we ensured that the pointers indicated
0 bytes in the buffer, so consumed is either 0 or the actual number of
bytes read into the buffer after a successful refill.
reserved, but it is less if fewer
characters were available. So, move is set to the minimum of the number of
consumed characters and reserved.
d_base.
d_base, not from d_buffer.
streambuf's read buffer pointers are set up.
Eback() is set to move locations before
d_base, thus defining the guaranteed unget-area,
gptr() is set to d_base, since that's the
location of the first read character after a refill, and
egptr() is set just beyond the location of
the last read character.
class
fdunget. The program reads at most 10 characters from the standard input,
stopping at EOF. A guaranteed unget-buffer of 2 characters is defined,
in a buffer of 3 characters. Just before reading a character it is tried to
unget at most 6 characters. This is of course never possible, but the program
will neatly unget as many characters as possible, considering the actual
number of characters read:
#include "fdunget.h"
#include <string>
#include <iostream>
#include <istream>
using namespace std;
int main(int argc)
{
fdunget
fds(0, 3, 2);
istream
is(&fds);
char
c;
for (int idx = 0; idx < 10; ++idx)
{
cout << "after reading " << idx << " characters:\n";
for (int ug = 0; ug <= 6; ++ug)
{
if (!is.unget())
{
cout
<< "\tunget failed at attempt " << (ug + 1) << "\n"
<< "\trereading: '";
is.clear();
while (ug--)
{
is.get(c);
cout << c;
}
cout << "'\n";
break;
}
}
if (!is.get(c))
{
cout << " reached\n";
break;
}
cout << "Next character: " << c << endl;
}
}
/*
Generated output after 'echo abcde | program':
after reading 0 characters:
unget failed at attempt 1
rereading: ''
Next character: a
after reading 1 characters:
unget failed at attempt 2
rereading: 'a'
Next character: b
after reading 2 characters:
unget failed at attempt 3
rereading: 'ab'
Next character: c
after reading 3 characters:
unget failed at attempt 4
rereading: 'abc'
Next character: d
after reading 4 characters:
unget failed at attempt 4
rereading: 'bcd'
Next character: e
after reading 5 characters:
unget failed at attempt 4
rereading: 'cde'
Next character:
after reading 6 characters:
unget failed at attempt 4
rereading: 'de
'
reached
*/
form() (and
scan()) for stream objects. In this section the
construction of a class
oformstream is described, which is derived from
ostream and does support form() and
vform(). Analogously to oformstream a class
iscanstream can be
developed, supporting scan() and
vscan(). The contruction of this
latter class is left as an
exercise to the reader.
Here is the class interface and definition. It is annotated below:
#include <iostream>
class oformstream: public std::ostream // 1
{
public:
oformstream(std::ostream &ostr) // 2
:
std::ostream(ostr.rdbuf())
{}
std::ostream &form(char const *fmt, ...); // 3
};
ostream, so it
inherits ostream's capabilities.
std::ostream object.
form() is declared. Its definition will be given
shortly.
A program using the class oformstream is given in the next
example. It prints a well-known string and some marker text:
#include "oformstream.h"
#include <memory>
#include <stdarg.h>
std::ostream &oformstream::form(char const *fmt, ...) // 1
{
va_list list;
va_start(list, fmt);
unsigned n = vsnprintf(0, 0, fmt, list); // 2
std::auto_ptr<char> buf(new char[n + 1]); // 3
vsprintf(buf.get(), fmt, list); // 4
return *this << buf.get(); // 5
}
using namespace std;
int main()
{
oformstream
of(cout);
of << "Hello world\n";
cout << "Ok, ";
of << "That's all, folks\n";
of.form("%s\n", "Hello world of C++") << endl;
};
/*
Generated output:
Hello world
Ok, That's all, folks
Hello world of C++
*/
In the above example:
form() is defined.
vsnprintf() is given a buffer size of 0 and
0 as the pointer to the buffer. According to the
ISO/IEC 9899:1999
standard, the function vsnprintf() will return the number of required
characters for the formatted string (not including the final
ASCII-Z
value) (see the vsnprintf() man page for details and possible alternative
implementations of vsnprintf()).
auto_ptr is used to manage the dynamically
allocated buffer.
ostream base class, and the oformstream object is
returned.
ios::rdbuf() member
function. By assigning the streambuf of a stream to another stream, both
stream objects access the same streambuf, thus realizing redirection at
the level of the programming language itself.
It should be realized that this is fine within the context of the C++
program, but if that context is left, the redirection is terminated, as the
operating system does not know about streambuf objects. This happens,
e.g., when a program uses a
system() call to start a subprogram. The
following program uses C++ redirection to redirect the information
inserted into
cout to a file, and then calls
system("echo hello world")
to echo a well-known line of text. Since echo writes its information
to the standard output this would be the program's redirected file if
C++'s redirection would be recognized by the operating system. However,
this is not the case, so hello world still appears at the program's
standard output and not on the redirected file. A solution of this problem
involves redirection at the operating system level, for which system calls
like
dup() and
dup2() are available in some operating systems (e.g.,
Unix and friends). Examples of the use of these system calls are given in
section 19.4 in this chapter.
Here is the example of the failing redirection at the system level
following C++ redirection using streambuf redirection:
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace::std;
int main()
{
ofstream
of("outfile");
cout.rdbuf(of.rdbuf());
cout << "To the of stream" << endl;
system("echo hello world");
cout << "To the of stream" << endl;
}
/*
Generated output: on the file `outfile'
To the of stream
To the of stream
On standard output:
hello world
*/
fork()
system call is well
known. When a program needs to start a new process,
system() can be used,
but this requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to call fork().
In this section we will see how C++ can be used to wrap classes around a
complex system call like fork(). Much of what follows in this section
directly applies to the
Unix
operating system, and the discussion will
therefore focus on that operating system. However, other systems usually
provide comparable facilities. The following discussion is based heavily on
the notion of
design patterns, as published by Gamma et al. (1995)
When fork() is called, the current program is duplicated in memory,
thus creating a new process, and both processes continue their execution just
below the fork() system call. The two processes may, however, inspect the
return value of fork(): the return value in the original process (called
the
parent process) differs from the return value
in the newly created process (called the
child process):
fork() returns the
process ID of
the child process that was created by the fork() system call. This is a
positive integer value.
fork() returns 0.
fork() fails, -1 is returned.
A basic Fork class should hide the bookkeeping details of a system
call like fork() from its users. The Fork class developed below will
do just that. The class itself only needs to take care of the proper execution
of the fork() system call. Normally, fork() is called to start a child
process, usually boiling down to the execution of a separate process. This
child process may expect input at its standard input stream and/or may
generate output to its standard output and/or standard error streams. Fork
does not know all this, and does not have to know what the child process is.
However, a Fork object should be able to activate the child
process.
Unfortunately, it is not known to the constructor of the Fork class
what actions the child process should perform. Similarly, it is not known
what actions the parent process should perform. For this particular situation,
the
template method design pattern
was developed. According to Gamma c.s., the template method
design pattern
``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''
This design pattern allows us to define a
base class in which the
essential steps of the fork() system call have already been implemented,
deferring the implementation of parts of the normal use of the fork()
system call to subclasses.
In particular, the Fork base class will implement:
fork() member function, which will perform the actual forking
(i.e., it will create the (new) child process);
virtual destructor ~Fork(), which may be
overridden by derived classes;
getPid() function, allowing derived classes to
access the system fork()'s return value.
int waitForChild(), which member can be called by a parent process to
wait for the completion of its child process (as discussed below).
Then, the following essential steps of the normal use of fork() are left
to be implemented by derived classes:
virtual void childProcess(): the member defining the actions to be
taken by the childprocess;
virtual void parentProcess(): the member defining the actions to be
taken by the parentprocess;
virtual void childRedirections(): this member should be implemented
if any of the standard streams (cin, cout) or cerr in the
childprocess must be redirected;
virtual void parentRedirections(): this member should be implemented
if any of the standard streams (cin, cout) or cerr in the
parentprocess must be redirected;
The header file implementing the Fork base class can now be
constructed. It is:
class Fork
{
int d_pid;
public:
virtual ~Fork()
{}
void fork();
protected:
int getPid()
{
return d_pid;
}
virtual void childRedirections() // do child redirections
{}
virtual void childProcess() = 0; // must be implemented
virtual void parentRedirections() // do parent redirections
{}
virtual void parentProcess() = 0; // must be implemented
int waitForChild(); // returns the status
};
The implementation of the member function fork() provides for the
calling of the system function fork() (Caution: since the system function
fork() is called in the member function having the same name, the ::
scope resolution operator must be used to prevent a recursive call of the
member function itself), whereafter, depending on the return value of
::fork(), either the parentProcess() or the childProcess() is
called. Redirection of the standard streams may be necessary. The redirection
are to be defined in childRedirections() and parentRedirections():
#include "fork.ih"
void Fork::fork()
{
if ((d_pid = ::fork()) < 0)
throw "Fork::fork() failed";
if (d_pid == 0) // childprocess has pid == 0
{
childRedirections();
childProcess();
exit (1); // we shouldn't come here:
// childProcess should exit
}
parentRedirections();
parentProcess();
}
In fork.cc the class'
internal header file fork.ih is
included. This header file takes care of the inclusion of the necessary system
header files, as well as the inclusion of fork.h itself. Its
implementation (omitting the
redundant include guards) is:
#include "fork.h"
#include <cstdlib>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
Normally, the child process does not return: once it has completed its task it
should terminate as a separate process. The parent process usually waits for
its child process to complete. The terminating child process informs
its parent that it is about to terminate by sending a
signal to the
parent, which should be caught by the parent. If the child terminates and the
parent process does not catch the signal generated by the child process then
the child process remains visible as a so-called
zombi process. If the
parent has to wait for the child, the member waitForChild() can be called
in the implementation of the function parentProcess() to wait for the
child's termination signal. Its implementation is:
#include "fork.ih"
int Fork::waitForChild()
{
int status;
waitpid(d_pid, &status, 0);
return WEXITSTATUS(status);
}
This simple implementation basically returns the child's exit status
to the parent. The called system function
waitpid() blocks until the
child terminates.
There is, however, a situation where the child process continues to
live, but the parent dies. In nature this happens all the time: parents
tend to die before their children do. In our context (i.e. C++), this
represents a
daemon program, where the parent process dies and the child
program actually continues as a child of the basic
init process. Again,
when the child eventually dies a signal is sent to its `
step-parent'
init. No zombi is created here, as init catches the termination
signals of all its
(step-) children.
The construction of a daemon process is so simple, that no special Fork
object is needed (see section 19.4.1 for an example).
fork() is to start a
child process. The parent process terminates immediately after spawning the
child process. If this happens, the child process continues to run as a child
process of
init, the always running first process on
Unix systems. Such
a process is often called a
daemon, running as a
background process.
Although the following example can easily be constructed as a plain C
program, it was included in the C++ Annotations because it is so closely
related to the current discussion of the Fork class. I thought about
adding a daemon() member to that class, but eventually decided against it
because the construction of a daemon program is very simple and uses none of
the features currently offered by the Fork class.
Here is an example illustrating the construction of a daemon program:
#include <iostream>
#include <unistd.h>
#include "fork.h"
class Daemon: public Fork
{
public:
virtual void parentProcess() // the parent does nothing.
{}
virtual void childProcess()
{
sleep(3); // actions taken by the child
// just a message...
std::cout << "Hello from the child process\n";
exit (0); // The child process should
} // exit here.
};
int main()
{
Daemon daemon;
daemon.fork(); // program immediately returns
return 0;
}
/*
Generated output:
The next command prompt, then after 3 seconds:
Hello from the child process
*/
pipe() system call. One of the file descriptors is used for writing, the
other file descriptor is used for reading.
fork() function is called),
which duplicates the file descriptors: both the child process and the parent
process now have their own copies of the two filedescriptors.
Pipe class that is
constructed here. The interface of the class is:
class Pipe
{
private:
enum RW { READ, WRITE };
int d_fd[2];
void redirect(int d_fd, int alternateFd);
public:
Pipe();
int useForReading();
void useForReadingFrom(int fileDescriptor);
int useForWriting();
void useForWritingTo(int fileDescriptor);
void useForWritingTo(int const *fileDescriptors, unsigned n = 2);
};
The constructor constructs a set of associated file descriptors using the
pipe() system call. The class has two sets of member functions:
useForReading() and useForReadingFrom() can be used to setup
the reading side of the pipe. The former function is used to setup
redirection, so that a specific file descriptor (usually
STDIN_FILENO)
may be used for reading;
the latter is used to setup the reading side of the pipe, and returns a file
descriptor that can be used to read from the pipe.
useForWriting() and useForWritingTo() can be used to setup
the writing side of the pipe. The former function comes in two overloaded
versions:
useForWritingTo(int fileDescriptor) is used to setup single
redirection, so that a specific file descriptor (usually
STDOUT_FILENO
or
STDERR_FILENO) may be used to write to;
(useForWritingTo(int *fileDescriptor, unsigned n = 2)) may be used
to setup multiple redirection by providing an array argument containing
the file descriptors whose information is redirected to the pipe.
Pipe were implemented as follows:
#include "pipe.h"
#include <unistd.h>
Pipe::Pipe()
{
if (pipe(d_fd))
throw "Pipe::Pipe(): pipe() failed";
}
void Pipe::redirect(int d_fd, int alternateFd)
{
if (dup2(d_fd, alternateFd) < 0)
throw "Pipe: redirection failed";
}
int Pipe::useForReading()
{
close(d_fd[WRITE]);
return d_fd[READ];
}
void Pipe::useForReadingFrom(int fd)
{
useForReading();
redirect(d_fd[READ], fd);
close(d_fd[READ]);
}
int Pipe::useForWriting()
{
close(d_fd[READ]);
return d_fd[WRITE];
}
void Pipe::useForWritingTo(int const *fd, unsigned n)
{
useForWriting();
for (int idx = 0; idx < n; idx++)
redirect(d_fd[WRITE], fd[idx]);
close(d_fd[WRITE]);
}
void Pipe::useForWritingTo(int fd)
{
useForWritingTo(&fd, 1);
}
Pipe class and its derivatives
IPipe and OPipe are constructed here. Objects of these classes may be
used to construct sets of associated file descriptors.
Pipe defines and initializes the file descriptor array. Here is the
definition of the class Pipe:
class Pipe
{
private:
enum RW { READ, WRITE };
int d_fd[2];
void redirect(int d_fd, int alternateFd);
public:
Pipe();
int useForReading();
void useForReadingFrom(int fileDescriptor);
int useForWriting();
void useForWritingTo(int fileDescriptor);
void useForWritingTo(int const *fileDescriptors, unsigned n = 2);
};
Fork and Pipe, it's time for an application.
The class
ParentSlurp, derived from Fork, is a class which starts a
child process which execs a program (like /bin/ls). The (standard)
output of the program is read by the parent, who will (for demonstration
purposes) write the received lines to its standard output stream, while
prepending linenumbers to the lines. It is most convenient here to redirect
the parents standard input stream, so that the parent can read the output from
the child process from its std::cin input stream.
In this case, the following members must be implementated:
ParentSlurp, initializing the pipe;
childRedirections() member, defining its pipe as a pipe for
reading and redirecting its STDIN_FILENO to read from its side of
the pipe.
parentRedirections() member, defining its pipe as a pipe for
writing and redirecting its STDOUT_FILENO to write to its side of
the pipe.
childProcess() member, exec-ing a program (which will
write information to its standard output)
parentProcess() member, slurping the child's output from its
standard input, to be processed further.
ParentSlurp class:
#include "fork.h"
#include "pipe.h"
#include <unistd.h>
class ParentSlurp: public Fork
{
Pipe d_pipe;
public:
ParentSlurp()
{}
protected:
virtual void childRedirections()
{
d_pipe.useForWritingTo(STDOUT_FILENO);
}
virtual void parentRedirections()
{
d_pipe.useForReadingFrom(STDIN_FILENO);
}
virtual void childProcess()
{
execl("/bin/ls", "/bin/ls", 0);
}
virtual void parentProcess();
};
And here is the implementation. The parentProcess() member repeatedly
reads lines from its standard input, until all lines have been read. Then it
waits for the child to complete. The main() function simply constructs a
ParentSlurp object, and calls its fork() member. Output should consist
of a numbered list of files in the directory where the program is started. An
example is provided with the implementation given below. Note that the program
also needs the fork.o, pipe.o and waitforchild.o object files (see the
earlier sources):
#include "parentslurp.h"
#include <string>
#include <iostream>
void ParentSlurp::parentProcess()
{
std::string line;
unsigned int nr = 1;
while (getline(std::cin, line))
std::cout << nr++ << ": " << line << std::endl;
waitForChild();
}
int main()
{
ParentSlurp ps;
ps.fork();
return 0;
}
/*
Generated Output (example only, actually obtained output may differ:
1: a.out
2: bitand.h
3: bitfunctional
4: bitnot.h
5: daemon.cc
6: fdinseek.cc
7: fdinseek.h
...
*/
start will start a new child process. The parent will return the
ID (a number) to the user. The ID may thereupon used to send a message to the
child;
<nr> text will send ``text'' to child process with ID <nr>;
stop <nr> will terminate child process having ID <nr>;
exit will terminate the parent as well as all its children.
A problem with programs like these is that they allow input from multiple
sources: input may appear from the standard input as well as from
the read-side of the communication pipes. Also, multiple output channels are
used. To handle situations like this, the
select() system call was
developed.
The
select() system call was developed to handle synchronous I/O
multiplexing. This system call can be used to handle, e.g., input appearing
synchroneously at a set of file descriptors.
Using the select() system call is rather complex, and its full
discussion is beyond the scope of the C++ Annotations.
However, its use may be simplified by providing a class Selector,
which hides away the details of using select(). The interface of such a
class is:
#ifndef _INCLUDED_SELECTOR_H_
#define _INCLUDED_SELECTOR_H_
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#include <limits.h>
class Selector
{
fd_set d_read;
fd_set d_write;
fd_set d_except;
fd_set d_ret_read;
fd_set d_ret_write;
fd_set d_ret_except;
timeval d_alarm;
int d_max;
int d_ret;
int d_readidx;
int d_writeidx;
int d_exceptidx;
public:
Selector();
int wait();
int nReady()
{
return d_ret;
}
int getReadFd()
{
return checkSet(&d_readidx, d_ret_read);
}
int getWriteFd()
{
return checkSet(&d_writeidx, d_ret_write);
}
int getExceptFd()
{
return checkSet(&d_exceptidx, d_ret_except);
}
void setAlarm(int sec, int usec = 0)
{
d_alarm.tv_sec = sec;
d_alarm.tv_usec = usec;
}
void noAlarm()
{
setAlarm(INT_MAX, INT_MAX);
}
void addReadFd(int fd)
{
addFd(&d_read, fd);
}
void addWriteFd(int fd)
{
addFd(&d_write, fd);
}
void addExceptFd(int fd)
{
addFd(&d_except, fd);
}
void rmReadFd(int fd)
{
FD_CLR(fd, &d_read);
}
void rmWriteFd(int fd)
{
FD_CLR(fd, &d_write);
}
void rmExceptFd(int fd)
{
FD_CLR(fd, &d_except);
}
private:
int checkSet(int *index, fd_set &set);
void addFd(fd_set *set, int fd);
};
#endif
The class has a rather extensive interface, but that's mainly the
consequence of select() being a complex function. It's members are:
Selector(): the constructor. It requires no
arguments.
int wait(): this function will block() until
activity is sensed at any of the file descriptors that are monitored by
the Selector object, or if the alarm times out. It will throw an
exception when select(), called by wait(), fails.
int nReady: its return value is defined only
when wait() has returned. In that case its return value is 0 for a
alarm-timeout, -1 if select() failed, and the number of file descriptors
on which activity was sensed otherwise.
int getReadFd(): its return value is defined
only when wait() has returned. In that case its return value is -1 if no
more input file descriptors are available. Otherwise the next available file
descriptor for reading is returned.
int getWriteFd(): analogously to
getReadFd(), but it relates to the set of output-related filedescriptors.
int getWriteFd(): analogously to
getReadFd(), but it relates to the set of exception-related
filedescriptors.
void setAlarm(int sec, int usec = 0): this
member sets the alarm-time. At least the number of seconds to wait for the
alarm to go off is required.
void noAlarm(): this member switches off the
alarm.
void addReadFd(int fd): this member adds
a file descriptor to the set of input file descriptors monitored by the
Selector object.
void addWriteFd(int fd): this member adds a
file descriptor to the set of output file descriptors monitored by the
Selector object.
void addExceptFd(int fd): this member adds
a file descriptor to the set of exception file descriptors to be monitored by
the Selector object.
void rmReadFd(int fd): this member removes a
file descriptor from the set of input file descriptors monitored by the
Selector object.
void rmWriteFd(int fd): this member removes a
file descriptor from the set of output file descriptors monitored by the
Selector object.
void rmExceptFd(int fd): this member removes
a file descriptor from the set of exception file descriptors to be monitored
by the Selector object.
Selector() and wait() were implemented as follows:
#include "selector.h"
Selector::Selector()
{
FD_ZERO(&d_read);
FD_ZERO(&d_write);
FD_ZERO(&d_except);
noAlarm();
d_max = 0;
}
int Selector::wait()
{
timeval t = d_alarm;
d_ret_read = d_read;
d_ret_write = d_write;
d_ret_except = d_except;
d_readidx = 0;
d_writeidx = 0;
d_exceptidx = 0;
d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, &t);
if (d_ret < 0)
throw "Selector::wait()/select() failed";
return d_ret;
}
void Selector::addFd(fd_set *set, int fd)
{
FD_SET(fd, set);
if (fd >= d_max)
d_max = fd + 1;
}
int Selector::checkSet(int *index, fd_set &set)
{
int &idx = *index;
while (idx < d_max && !FD_ISSET(idx, &set))
++idx;
return idx == d_max ? -1 : idx++;
}
The child-processes started by the monitor are defined by the class
Child. The Child class is derived from the class Fork, so
childProcess() and parentProcess() members must be defined. However,
in this case the parent process is empty, as all communications with the child
process take place in the Monitor object (see section 19.4.4.3).
The Child object will set up redirections in such a way that it can read
its standard input and output in the child process itself, and will make the
opposites ends of the corresponding pipes known to the parent process via two
auxiliary functions, getReadFd() and getWriteFd().
Note that the reading end of a pipe in the child process corresponds to the
writing end of a pipe in the paren process (and vice versa). The Child
object is responsible for the bookkeeping that's involved here: the user of
the Child process shouldn't have to juggle with the opposite ends of
pipes, but should be given the appropriate ends, including their
functions. Hence, getReadFd(), which is used by the parent process, will
produce the file descriptor from which the parent may read information sent to
it by its child.
Apart from this noteworthy characteristic, Child's interface doesn't
contain special features. Here it is:
#include "pipe.h"
#include "fork.h"
#ifndef _INCLUDED_SELECTOR_H_
#include "selector.h"
#endif
class Child: public Fork
{
Pipe d_in;
Pipe d_out;
Selector d_selector;
int d_parentReadFd;
int d_parentWriteFd;
int d_nr;
public:
Child(int nr)
:
d_nr(nr)
{}
int getReadFd()
{
return d_parentReadFd;
}
int getWriteFd()
{
return d_parentWriteFd;
}
int getPid()
{
return Fork::getPid();
}
int getNr()
{
return d_nr;
}
virtual void childRedirections();
virtual void parentRedirections();
virtual void childProcess();
virtual void parentProcess()
{}
};
The implementation of the member functions is given below. Please pay some
attention to the way the ends of the two pipes are used: in
childRedirections() the redirection to the standard streams takes place,
in parentRedirections() the d_out pipe's reading file descriptor is
assigned to d_parentReadFd, which is then returned by getReadFd(). The
analogous situation holds true for the d_in pipe.
Next, the childProcess() will add the STDIN_FILENO to the
d_selector object's set of read file descriptors. Since redirection has
taken place, the child process can simply concentrate on the redirected file
descriptor.
Then, in a perpetual loop the child process waits for the Selector object
to return from its wait() call. When the alarm goes off, it sends a
message to its standard output (hence, into the writing pipe) that it's still
standing by, otherwise it will echo the messages appearing at its standard
input to its standard output.
Here is the implementation of the Child members:
#include "child.h"
#include <iostream>
#include <string>
using namespace std;
void Child::childRedirections()
{
d_in.useForReadingFrom(STDIN_FILENO);
d_out.useForWritingTo(STDOUT_FILENO);
}
void Child::parentRedirections()
{
d_parentReadFd = d_out.useForReading();
d_parentWriteFd = d_in.useForWriting();
}
void Child::childProcess()
{
Selector selector;
unsigned message = 0;
selector.addReadFd(STDIN_FILENO);
selector.setAlarm(5);
while (true)
{
try
{
if (!selector.wait()) // timeout
cout << "Child " << d_nr << ": standing by\n";
else
{
string line;
getline(cin, line);
cout << "Child " << d_nr << ":" << ++message << ": " <<
line << endl;
}
}
catch (...)
{
cout << "Child " << d_nr << ":" << ++message << ": " <<
"select() failed" << endl;
}
}
exit(0);
}
The Monitor object does all the work. The public interface of the
class MonitorIts public interface is rather simple, as it just contains
the function to start the whole process. Here is the interface of the
class Monitor. Note that it contains a
nested class Find. This
nested class is discussed below, at the description of the
stopChild() member function:
#include <string>
#include <map>
#ifndef _INCLUDED_SELECTOR_H_
#include "selector.h"
#endif
#include "child.h"
class Monitor
{
enum Commands
{
UNKNOWN,
START,
EXIT,
STOP,
TEXT
};
Selector d_selector;
int d_nr;
std::map<int, Child *> d_child;
public:
Monitor()
:
d_nr(0)
{}
void run();
private:
static void waitForChild(int signum);
static void killChild(std::map<int, Child *>::value_type it);
int next(int &value, std::string &line);
void processInput();
void processChild(int fd);
void exiting();
void stopChild(int value);
void sendChild(int value, std::string const &line);
void createNewChild();
class Find
{
int d_nr;
public:
Find(int nr)
:
d_nr(nr)
{}
bool operator()(std::map<int, Child *>::value_type &vt)
const
{
return d_nr == vt.second->getNr();
}
};
};
Apart from the constructor, there is just one public member:
run(). This member does the actual child-handling, as follows:
waitForChild() will be installed to
do just that.
Monitor object will listen only to its standard
input: the set of input file descriptors to which select() listens is
initialized to STDIN_FILENO.
Selector's wait() function is called.
If input on cin is available, it is processed by processInput(),
otherwise, the input will have arived from a child process, and it is
processed by processChild().
run() was implemented as follows:
#include "monitor.ih"
void Monitor::run()
{
signal(SIGCHLD, waitForChild);
d_selector.addReadFd(STDIN_FILENO);
while (true)
{
cout << "? " << flush;
try
{
d_selector.wait();
int fd;
while ((fd = d_selector.getReadFd()) != -1)
{
if (fd == STDIN_FILENO)
processInput();
else
processChild(fd);
}
}
catch (...)
{
cerr << "select failed, exiting\n";
exiting();
}
}
}
The member processInput() reads the commands from the user via the
standard input. The member itself is rather simple: it calls next() to
obtain the next command entered by the user, and then calls the corresponding
function. This member and its helper member are not complicated. The sources,
which were kept rather basic, will hardly need further elaboration at this
point in the Annotations:
#include "monitor.ih"
void Monitor::processInput()
{
string line;
int value;
switch (next(value, line))
{
case START:
createNewChild();
break;
case EXIT:
exiting();
break;
case STOP:
stopChild(value);
break;
case TEXT:
sendChild(value, line);
break;
default:
cout << "unknown: " << line << "\n";
break;
}
}
int Monitor::next(int &value, string &line)
{
if (!getline(cin, line))
throw "Command::next(): reading cin failed";
if (line == "start")
return START;
if (line == "exit")
return EXIT;
if (line.find("stop") == 0)
{
istringstream istr(line.substr(4));
istr >> value;
return !istr ? UNKNOWN : STOP;
}
istringstream istr(line.c_str());
istr >> value;
if (istr)
{
getline(istr, line);
return TEXT;
}
return UNKNOWN;
}
All remaining input sensed by select() originated in a child
process. The processChild() member is called with the appropriate file
descriptor. Then, using a
ifdstreambuf (see section 19.1.2.1), the
input appearing at the file descriptor is read from an input stream. The
communication protocol that is used here is also rather basic: To every
line of input sent to a child, the child sends exactly one line
of text in return. Consequently, processChild() can just read the line of
text, and return:
#include "monitor.ih"
void Monitor::processChild(int fd)
{
ifdstreambuf ifdbuf(fd);
istream istr(&ifdbuf);
string line;
getline(istr, line);
cout << d_child[fd]->getPid() << ": " << line << endl;
}
Please note the construction d_child[fd]->getPid() that is used in
the above source. Looking at the interface of Monitor, a map<int, Child
*> is observed. This map contains the input file descriptor as its key,
and a pointer to the Child object as its value. A pointer is used here,
rather than a Child object, since we do want to use the facilities offered
by the map, but don't want to copy a Child object.
The responsibilities to destroy the Child object once it becomes
superfluous is now, of course, with the programmer, and not any more with the
run-time support system.
A new element is added to the map by the member createNewChild(),
which is called in response of the user entering the start command. Here
is its definition:
#include "monitor.ih"
void Monitor::processChild(int fd)
{
ifdstreambuf ifdbuf(fd);
istream istr(&ifdbuf);
string line;
getline(istr, line);
cout << d_child[fd]->getPid() << ": " << line << endl;
}
Here the new processes are started. Once the Child object is created,
its fork() member is called. This will (as fork() uses the template
method design pattern, see section 19.4) call the child- and parent-
related functions in, respectively, the child- and parent-processes. As the
child process never returns (see section 19.4.4.2), and
Child::parentProcess() is an empty function, the code below the
cp->fork() call is only executed in the parent process, which is the
main program. This code simply adds the file descriptor that's available for
reading information from the child to the set of input file descriptors
monitored by select(), and stores the pointer to the Child object in
the d_child map.
Direct communication with the child is required as the result of the
stop <nr> and <nr> text commands, which will, respectively, terminate
child process <nr> and send the text text to child process <nr>.
In both cases, the program has to figure out which process belongs to a
certain child number. Here the
find_if() generic algorithm (see section
17.4.14) was used to look for the requested child-number in the
d_child map. The definition of the class Find could be part of the
internal header file monitor.ih, or, alternatively, it could be defined as
a nested private class in Monitor's class interface. Here the former
approach was followed: Find is defined as a nested class of Monitor.
The stopChild() member looks for the requested child process number
and, if found, removes its input file descriptor from the set monitored by the
select() system call. Then the child process is terminated by calling
killChild(), whereafter the Child object is erased from the
map. Note that this erasure does not destroy the Child object, so before
erasing the map-element, the Child object is deleted first:
#include "monitor.ih"
void Monitor::stopChild(int nr)
{
map<int, Child *>::iterator it =
find_if(d_child.begin(), d_child.end(), Find(nr));
if (it != d_child.end())
{
d_selector.rmReadFd(it->second->getReadFd());
killChild(*it);
delete it->second;
d_child.erase(it);
}
else
cerr << "No child number " << nr << endl;
}
Comparably, the sendChild() member sends some information to a
particular child process. The same lookup action is required here as with
stopChild(): using a Monitor::Find object the validity of the
requested child process number is obtained, whereafter a
ofdnstreambuf
object is constructed (see section 19.1.1) to be used with an
ostream object, to send information to the child.
Three member functions remain to be discussed:
waitForChild() is activated by
SIGCHLD signals sent by
terminating children. The Monitor object simply acknowledges the
termination by catching the signal, which is further ignored.
exiting() terminates all child processes, by visiting all elements of
the d_child map, using the
for_each() generic algorithm (see section
17.4.17). Since the function exits the program, the destruction of the
Child objects is superfluous, and is skipped.
killChild() terminates a child process by sending it the
SIGTERM
signal.
#include "monitor.ih"
void Monitor::waitForChild(int signum)
{
int status;
wait(&status);
signal(SIGCHLD, waitForChild);
}
void Monitor::killChild(map<int, Child *>::value_type it)
{
if (kill(it.second->getPid(), SIGTERM))
cerr << "Couldn't kill process " << it.second->getPid() << endl;
}
void Monitor::exiting()
{
for_each(d_child.begin(), d_child.end(), killChild);
exit(0);
}
Finally, the program's main() function is simply:
#include "monitor.h"
int main()
{
Monitor monitor;
monitor.run();
}
/*
Example of a session:
# a.out
? start
Child 1 started
? 1 hello world
? 3394: Child 1:1: hello world
? 1 hi there!
? 3394: Child 1:2: hi there!
? start
Child 2 started
? 3394: Child 1: standing by
? 3395: Child 2: standing by
? 3394: Child 1: standing by
? s3395: Child 2: standing by
? top 1
? 3395: Child 2: standing by
? 2 hello world
? 3395: Child 2:1: hello world
? 1 hello world
No child number 1
? exit3395: Child 2: standing by
?
#
*/
For whatever reason, some operators are missing: there are no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
template class implementing a
function object calling the
bitwise and (
operator&()), and a template
class implementing a function object calling the
unary not
(
operator~()). Other missing operators can be constructed similarly. This
is, however, left as an
exercise to the reader.
Here is the implementation of a function object calling the
bitwise
operator&():
#include <functional>
template <typename _Tp>
struct bit_and : public std::binary_function<_Tp,_Tp,_Tp>
{
_Tp operator()(const _Tp& __x, const _Tp& __y) const
{
return __x & __y;
}
};
Here is the implementation of a function object calling operator~():
#include <functional>
template <typename _Tp>
struct bit_not : std::public unary_function<_Tp,_Tp>
{
_Tp operator()(const _Tp& __x) const
{
return ~__x;
}
};
These and other
missing predefined function objects
are also implemented in the
file
bitfunctional, which is found in one of the cplusplus
zip-archives.
An example using bit_and() to remove all odd numbers from a vector of
int values, using the
remove_if()
generic algorithm is:
#include <iostream>
#include <algorithm>
#include <vector>
#include "bitand.h"
using namespace std;
int main()
{
vector<int>
vi;
for (int idx = 0; idx < 10; ++idx)
vi.push_back(idx);
copy
(
vi.begin(),
remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
ostream_iterator<int>(cout, " ")
);
cout << endl;
}
/*
Generated output:
0 2 4 6 8
*/
Vector, introduced in section
18.3 to contain a RandomAccessIterator reverse_iterator.
First, we note that applying the ordinary iterators of the Vector class
(made available by the begin() and end() member functions) to the
generic algorithms is without any problem, as the pointers that are returned
by these members are accepted by the generic algorithms. Here is a Vector
template class:
#include <algorithm>
template <typename Type>
class Vector
{
typedef Type *iterator;
public:
Vector()
{
init(0);
};
Vector(unsigned n)
{
init(n);
}
Vector(Vector<Type> const &other)
{
construct(other);
}
~Vector()
{
delete [] data;
}
Vector<Type> const &operator=(Vector<Type> const &other)
{
if (this != &other)
{
delete [] data;
construct(other);
}
return *this;
}
Type &operator[](unsigned index) throw(char const *)
{
if (index >= (beyond - data))
throw "Vector array index out of bounds";
return data[index];
}
void push_back(Type const &value)
{
if (!data)
{
init(1);
beyond = data;
}
else if (beyond == end_of_storage)
{
Vector<Type>
enlarged((end_of_storage - data) << 1);
copy(data, beyond, enlarged.data);
delete [] data;
beyond = enlarged.data + (beyond - data);
data = enlarged.data;
end_of_storage = enlarged.end_of_storage;
enlarged.data = 0;
}
*beyond++ = value;
}
iterator begin()
{
return data;
}
iterator end()
{
return beyond;
}
unsigned size()
{
return beyond - data;
}
private:
void init(unsigned n)
{
if (n)
{
data = new Type[n];
beyond = data + n;
end_of_storage = data + n;
}
else
{
data = 0;
beyond = 0;
end_of_storage = 0;
}
}
void construct(Vector<Type> const &other)
{
init(other.beyond - other.data);
copy(other.data, other.beyond, data);
}
Type
*data,
*beyond,
*end_of_storage;
};
And a simple program using Vector:
#include <iostream>
#include "vector.h"
using namespace std;
int main()
{
Vector<int>
vi;
for (int idx = 0; idx < 10; ++idx)
vi.push_back(idx);
random_shuffle(vi.begin(), vi.end());
copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
cout << endl;
}
/*
generated output, e.g.,
1 7 9 3 5 6 0 4 8 2
*/
However, the use of a reverse iterator is not without problems: the
reverse iterator shows rather complex behavior, in that it visits the elements
in its range backwards, resulting in a reversal of the meanings of the
operators that are used to step forward and step backward in a range. If a
reverse iterator rit points to element 11 in a range, then it will point
to element 10, rather than 12 after ++rit. Because of this complex
behavior, the reverse iterator will be implemented as a class
reverse_iterator, which can do the required bookkeeping for us.
Furthermore, in the current context reverse_iterator is a reverse
iterator only for the elements of the class Vector. So, a
nested class seems to be appropriate here. Since Vector has
already been defined, we will decide to leave it untouched.
Rather than modifying Vector, a template class Vector2 is derived
of it, and Vector2 will be given its nested class reverse_iterator.
This latter class can be derived from std::random_access_iterator to
realize a
RandomAccessIterator for the class Vector2:
template <typename Type>
class Vector2: public Vector<Type>
{
public:
class reverse_iterator:
public std::random_access_iterator<Type, ptrdiff_t>
{
friend class Vector2<Type>; // see the text, below
...
};
...
};
The
ptrdiff_t type that is mentioned here is ordinarilly defined as an
int (e.g., with the
Linux
operating system it is defined as a
__kernel_ptrdiff_t in
types.h, which is defined as an int in
posix_types.h).
The reverse_iterator class is now defined as a nested template class,
derived from the template class random_access_iterator. Its surrounding
class is Vector2, which itself is a template class that was derived from
the template class Vector:
#include <iterator>
#include "vector.h"
template <typename Type>
class Vector2: public Vector<Type>
{
public:
Vector2()
:
Vector<Type>()
{}
Vector2(unsigned n)
:
Vector<Type>(n)
{}
class reverse_iterator:
public std::random_access_iterator<Type, ptrdiff_t>
{
friend class Vector2<Type>;
reverse_iterator(Type *begin, Type *end);
Type *d_begin;
Type *d_end;
public:
reverse_iterator(reverse_iterator const &other);
// reverse_iterator &operator=(reverse_iterator const &other);
Type &operator*() const;
bool operator<(reverse_iterator const &other) const;
bool operator==(reverse_iterator const &other) const;
bool operator!=(reverse_iterator const &other) const
{
return !(*this == other);
}
reverse_iterator &operator++();
reverse_iterator operator++(int);
reverse_iterator &operator--();
reverse_iterator operator--(int);
int operator-(reverse_iterator const &other) const;
reverse_iterator operator+(int step) const;
reverse_iterator operator-(int step) const;
};
reverse_iterator rbegin()
{
return reverse_iterator(begin(), end());
}
reverse_iterator rend()
{
return reverse_iterator(begin(), begin());
}
};
Within reverse_iterator we store two Type pointers, called
begin and end, which will be used to define the reversed range
[end - 1, begin - 1). In the
private section of reverse_iterator
we declare:
Type *d_begin;
Type *d_end;
The members d_begin and d_end receive their initial values from
the begin() and end() members of the Vector object.
However, the class reverse_iterator is closely tied to Vector2:
only Vector2 objects should be allowed to construct their own reverse
iterators. This policy is enforced by making Vector2<Type> a
bound friend of reverse_iterator, and by declaring its
constructor private. The constructor expects the two Type pointers,
defining the range [begin(), end()). Internally begin() - 1 will be
used as endpoint of the reversed range, while end() - 1 will be used
as the beginning of the reversed range.
The constructor, defined in the private section of the
reverse_iterator class, is therefore:
reverse_iterator(Type *begin, Type *end)
:
d_begin(begin),
d_end(end)
{}
In the implementation of reverse_iterator, d_end will gradually
walk towards d_begin, which will keep its value. The dereferencing
operator will not return *d_begin, but rather d_end[-1]: the element
just before the element to which d_end points. All this is allowed, as
Vector::begin() and Vector::end() return random access iterators
themselves, for which
pointer arithmetic operations are defined.
Since the reverse_iterator class declares Vector2<Type> as a
friend, the Vector2 class can implement two members, rbegin() and
rend() returning reverse_iterator objects:
reverse_iterator rbegin()
{
return reverse_iterator(begin(), end());
}
reverse_iterator rend()
{
return reverse_iterator(begin(), begin());
}
Apart from these two members, the Vector2 class could define two
constructors, analogous to the constructors in the Vector
class. E.g.,
Vector2()
:
Vector<Type>()
{}
In order to allow programs to use these reverse iterators, a public
copy constructor and overloaded
assignment operator is needed in the
class reverse_iterator.
Since the Type pointers in reverse_iterator are merely used for
pointing to existing information, a
destructor
is not required. Of course, this implies that the usefulness of a
reverse_iterator terminates if the object from which it was obtained goed
out of scope. It is the
responsibility of the programmer to make sure this doesn't cause
disaster. In the public section of reverse_iterator we add the following
members (implementations given as inline code):
reverse_iterator(reverse_iterator const &other)
:
d_begin(other.d_begin),
d_end(other.d_end)
{}
reverse_iterator &operator=(reverse_iterator const &other)
{
d_begin(other.d_begin);
d_end(other.d_end);
}
The implementation shown below is fairly
Spartan: illegal pointer
values are not detected. As the generic algorithms do not require these checks
we've left them out. Of course, one could modify the implementation and
incorporate checks at several locations. It's up to the reader.
Random access iterators require us to implement a series of member
functions. Here they are, as they should appear in the
public section of
the reverse_iterator nested class. Most implementations will be
self-explanatory, and consist of single lines of code:
#include "vector2.h"
#define VectorRevIt Vector2<Type>::reverse_iterator
template <typename Type>
VectorRevIt::reverse_iterator(reverse_iterator const &other)
:
d_begin(other.d_begin),
d_end(other.d_end)
{}
template <typename Type>
VectorRevIt::reverse_iterator(Type *begin, Type *end)
:
d_begin(begin),
d_end(end)
{}
template <typename Type>
Type &VectorRevIt::operator*() const
{
return d_end[-1];
}
template <typename Type>
Vector2<Type>::reverse_iterator VectorRevIt::operator++(int)
{
return reverse_iterator(d_begin, d_end--);
}
template <typename Type>
Vector2<Type>::reverse_iterator &VectorRevIt::operator++()
{
--d_end;
return *this;
}
template <typename Type>
Vector2<Type>::reverse_iterator &VectorRevIt::operator--()
{
++d_end;
return *this;
}
template <typename Type>
Vector2<Type>::reverse_iterator VectorRevIt::operator--(int)
{
return reverse_iterator(d_begin, d_end++);
}
template <typename Type>
bool VectorRevIt::operator<(reverse_iterator const &other) const
{
return d_end > other.d_end;
}
template <typename Type>
bool VectorRevIt::operator==(reverse_iterator const &other) const
{
return other.d_end == d_end;
}
template <typename Type>
int VectorRevIt::operator-(reverse_iterator const &other) const
{
return other.d_end - d_end;
}
template <typename Type>
Vector2<Type>::reverse_iterator VectorRevIt::operator+(int step) const
{
return reverse_iterator(d_begin, d_end - step);
}
template <typename Type>
Vector2<Type>::reverse_iterator VectorRevIt::operator-(int step) const
{
return operator+(-step);
}
The following member deserve special attention:
bool operator<(reverse_iterator const &other) const
This function compares two iterators. Theoperator<()must be interpreted as `the lvalue iterator points to an element earlier in the range than the rvalue iterator'. Since we're talking reversed iterators here, this means that the lvalue operator points to a later element than the rvalue operator in the normal iterator range. So, the overloaded operator should return true ifthis->end > other.end, implying that the current iterator points to an element that is located earlier in the range than the other iterator. So, we implement:return d_end > other.d_end;
reverse_iterator operator++(int)
The postfix increment operator must return areverse_iteratorpointing to the current element, incrementing the pointer. Again, this implies thatendis decremented. Since we have a constructor expecting two (normal) iterators,, the implementation again is a one-liner:return reverse_iterator(d_begin, d_end--);Using the postfix decrement withend, areverse_iteratorobject is returned representing the iterator's position at the time the postfix increment operator was called, incrementing (so, decrementingend) the reversed iterator as a side effect.
reverse_iterator operator++(int)
Implemented analogously:
return reverse_iterator(d_begin, d_end++);
int operator-(reverse_iterator const &other) const
This is the pointer arithmetic operator: it returns the difference between two pointers, representing the number of steps to make to reach the lvalue operand starting from the rvalue operand. Normally we would implement this asd_end - other.d_end. However, since reversed iterators reduce their values to reach elements later in the range, the algebraic sign must be negated with reversed iterators:-(d_end - other.d_end), or, using simle algebraic manipulations:return other.d_end - d_end;
reverse_iterator operator-(int step) const
This operator allows us to use random staps backward using a
random access iterator. With a reversed iterator this means that we must
increase the value of the iterator. So we return the following
reverse_iterator object:
return reverse_iterator(d_begin, d_end + step);
reverse_iterator operator+(int step) const
This operator does the opposite. So we implement it accordingly: return operator+(-step);
template <typename Type> // this is a template function
// the class in which the
Vector2<Type>::reverse_iterator<Type>:: // constructor is defined
reverse_iterator(Type *begin, Type *end)// the constructor itself
:
d_begin(begin),
d_end(end)
{}
template <typename Type> // this is a template function
Vector2<Type>::reverse_iterator // this is its return type
// the class in which the funcion
Vector2<Type>::reverse_iterator<Type>:: // is defined
operator+(int step) const // this is the function itself
{
return reverse_iterator(d_begin, d_end - step);
}
Finally, an example of a program using the reversed_iterator:
#include <iostream>
#include "vector3.h"
using namespace std;
int main()
{
Vector2<int>
vi;
for (int idx = 0; idx < 10; ++idx)
vi.push_back(idx);
copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
cout << endl;
random_shuffle(vi.rbegin(), vi.rend());
copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
cout << endl;
sort(vi.rbegin(), vi.rend());
copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
cout << endl;
reverse(vi.rbegin(), vi.rend());
copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
cout << endl;
copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " "));
cout << endl;
}
/*
Generated output:
0 1 2 3 4 5 6 7 8 9
7 1 5 9 3 4 6 0 2 8
9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9
9 8 7 6 5 4 3 2 1 0
*/
The current example assumes that the reader knows how to use the
scanner generator
flex and the
parser generator
bison. Both
bison and flex are well documented elsewhere. The original
predecessors of bison and flex, called
yacc and
lex are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
However, scanner- and parser generators are also (and maybe even more
commonly, nowadays) available as free software. Both bison and flex
can be obtained from
ftp://prep.ai.mit.edu/pub/gnu. Flex
will create a C++ class when called as flex++, or when the
-+ flag
is used.
For bison the situation is a bit more complex. Scattered over the
Internet several bison++ archives can be found (e.g., in
ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or
rzbsdi01.uni-trier.de). The
information in these archives usually dates back to 1993, irrespective of the
version number mentioned with the archive itself.
Using flex++ and bison++ a class-based scanner and parser can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become somewhat cleaner than without using the
class interface.
Below two examples are given. In the first example only a lexical scanner is
used to monitor the production of a file from several parts. This example
focuses on the lexical scanner, and on switching files while churning through
the information. The second example uses both a scanner and a parser to
transform standard arithmetic expressions to their postfix notation, commonly
encountered in code generated by compilers and in HP-calculators. The
second example focuses on the parser.
The setup is as follows: The
input-language knows of an
#include
statement, which is followed by a string indicating the file which should be
included at the location of the #include.
In order to avoid complexities that have irrelevant to the current
example, the format of the #include statement is restricted to the
form
#include <filepath>.
The file specified between the pointed brackets should be available at
the location indicated by filepath. If the file is not available,
the program terminates using a proper error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.
One small extra feature is that
comment-lines should be recognized:
include directives in comment-lines should be ignored, comment being the
standard C++ comment-types.
The program is created in the following steps:
lexer is constructed, containing the
specifications of the input-language.
lexer the requirements for the
class Scanner evolve. The Scanner class is a
wrapper around the
class
yyFlexLexer generated by
flex++. The requirements results in the
specification of the
interface for the class Scanner.
main() function is constructed. A Startup object
is created to inspect the
command-line arguments. If successful,
the scanner's member yylex() is called to construct the
output file.
19.7.1.1: The flex++ specification file
The organization of the
lexical scanner specification file is similar
to the one used with flex. However, flex++ creates a class
(
yyFlexLexer) from which the class Scanner will be derived.
The code associated with the
regular expression rules will be located
inside the class yyFlexLexer. However, it would be handy to access the
member functions of the derived class within that code.
Fortunately,
inheritance helps us to realize this. In the specification of
the class yyFlexLexer(), we notice that the function
yylex() is a
virtual function. In the FlexLexer.h header file we see virtual
int yylex():
class yyFlexLexer: public FlexLexer
{
public:
yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
virtual ~yyFlexLexer();
void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
struct yy_buffer_state* yy_create_buffer( istream* s, int size );
void yy_delete_buffer( struct yy_buffer_state* b );
void yyrestart( istream* s );
virtual int yylex();
virtual void switch_streams( istream* new_in, ostream* new_out );
};
Consequently, if yylex() is defined in a derived class, then this
function of the derived class will be called from a
base class (i.e.,
yyFlexLexer) pointer. Since the yylex() function of the
derived class is called, that function will have access to the members
of its class, and to the public and protected members of its base class.
The context in which the generated scanner is placed is (by
default)
the function
yyFlexLexer::yylex(). However, this context can be changed by
defining the
YY_DECL-macro. This macro, if defined, determines the context
in which the generated scanner will be placed. So, in order to make the
generated scanner part of the derived class function yylex(), three
things must be done:
YY_DECL must be defined in the lexer specficiation
file. It must define the derived class function yylex() as the scanner
function. For example:
#define YY_DECL int Scanner::yylex()
yylex() must be declared in the class definition of
the derived class.
yyFlexLexer::yylex() is a virtual function, it
must still be defined. It is not called, though, so its definition may be a
simple:
int yyFlexLexer::yylex()
{
return 0;
}
YY_DECL macro and the yyFlexLexer::yylex()
function can conveniently be placed in the lexer specification file, as shown
below.
Looking at the regular expressions themselves, notice that we'll need rules
for the recognition of the comment, for the
#include directive, and for the remaining characters. This is all fairly
standard practice. When an #include directive is detected, the directive
is parsed by the scanner. This too is
common practice. Here is what will
happen in the current setup:
Scanner::Error value (invalidInclude) if this
fails;
nextSource;
#include directive
switchSource() is called to perform the switch to another file.
EOF) is detected, the derived class'
member function popSource() is called, which will pop the previously
pushed file, returning
true.
false,
resulting in the call of yyterminate(), which will terminate the scanner.
The lexical scanner specification file has three sections:
%option yylineno when the
lexical scanner should keep track of the
line numbers of the files it is
scanning;
istream *yyin
to the
ostream *yyout, for which the predefined macro
ECHO can be used. The complete and annotated lexical scanner
specification file to be processed by
flex++ is:
%{
/* ----------------------------------------------------------------------------
C++ -preamble.
Include header files, other than those generated by flex++ and bison++.
E.g., include the interface to the class derived from yyFlexLexer
----------------------------------------------------------------------------*/
// the yylex() function that's actually
// used
#define YY_DECL int Scanner::yylex()
#include "scanner.ih" // The interface of the derived class
int yyFlexLexer::yylex() // not called: overruled by
{ // Scanner::yylex()
return 0;
}
%}
/* ----------------------------------------------------------------------------
Flex++ symbol area
~~~~~~~~~~~~~~~~~~
The symbols mentioned here are used for defining e.g., a mini scanner
---------------------------------------------------------------------------- */
%x comment
%x include
%option yylineno
eolnComment "//".*
anyChar .|\n
/* ----------------------------------------------------------------------------
Flex rules area:
~~~~~~~~~~~~~~~~
Regular expressions below here define what the lexer will recognize.
---------------------------------------------------------------------------- */
%%
/*
The comment-rules: comment lines are ignored.
*/
{eolnComment}
"/*" BEGIN comment;
<comment>{anyChar}
<comment>"*/" BEGIN INITIAL;
/*
File switching: #include <filepath>
*/
#include[ \t]+"<" BEGIN include;
<include>[^ \t>]+ nextSource = yytext;
<include>">"[ \t]*\n {
BEGIN INITIAL;
performSwitch();
}
<include>{anyChar} throw invalidInclude;
/*
The default rules: eating all the rest, echoing it to output
*/
{anyChar} ECHO;
/*
The <<EOF>>)rule: pop a pushed file, or terminate the lexer
*/
<<EOF>> {
if (!popSource())
yyterminate();
}
Since the derived class is able to access the information stored within
the lexical scanner itself (it can even access the information directly,
since the data members of yyFlexLexer are
protected, and thus
accessible to derived classes), most processing can be done by the derived
class' member functions. This results in a very clean setup of the lexer
specification file, which requires hardly any code in the preamble.
19.7.1.2: The derived class: Scanner
The class Scanner is derived from the class
yyFlexLexer as generated
by
flex++. The derived class has access to the data controlled by the
lexical scanner. In particular, the derived class has access to the following
data
members:
char *yytext: contains the
text
matched by a
regular expression (accessible outside of the scanner from the
YYText() member);
int yyleng: the
length of
the text in yytext (accessible outside of the scanner from the
YYLeng() member);
int yylineno: the current
line number (only if
%option yylineo
was specified in the
lexer specfication file, accessible outside of the scanner
from the
lineno() member);
FlexLexer.h, which is part of the
flex distribution.
The class Scanner performs two tasks:
EOF is detected in a
file.
Scanner class the final
header file is given. Note
that, apart from the private member functions, several private data members
are used as well. These members are initialized by the Scanner()
constructor and are used in the private member functions. They are
discussed below, in the context of the member functions that are using
them. Here is the scanner.h header file:
#include <FlexLexer.h> // provides yyFlexLexer interface
#include <fstream>
#include <string>
#include <stack>
#include <vector>
class Scanner: public yyFlexLexer
{
public:
enum Error
{
invalidInclude,
circularInclusion,
nestingTooDeep,
cantRead,
};
Scanner(istream *yyin, string const &initialName);
string const &lastFile()
{
return fileName.back();
}
void stackTrace(); // dumps filename stack to cerr
int yylex(); // overruling yyFlexLexer's yylex()
private:
Scanner(Scanner const &other); // no Scanner copy-initialization
Scanner &operator=(Scanner const &other); // no assignment either
bool popSource(); // true: source popped
void performSwitch();
void throwOnCircularInclusion();
stack<yy_buffer_state *>
state;
vector<string>
fileName;
string
nextSource;
static unsigned const
sizeof_buffer = 16384,
maxDepth = 10;
};
The scanning process proceeds as follows:
Once the scanner extracts a filename from an #include directive, a
switch to another file is performed by performSwitch(). If the filename
could not be extracted, the scanner throws an invalidInclude
exception
value.
The performSwitch() member and the matching function popSource()
handle the file switch. In particular, note that
yylineno is not updated
when a
file switch is performed. If line numbers are to be monitored, then
the current value of yylineno should be pushed on a stack, and
yylineno should be reset by performSwitch(), while popSource()
should reinstate a former value of yylineno by popping a previously pushed
value from the stack.
In the current implementation of Scanner a simple
stack of
yy_buffer_state pointers is maintained. Changing that into a stack of
pair<yy_buffer_state *, unsigned> elements allows you to save the line
numbers. This modification is left as an
exercise to the reader.
As mentioned. performSwitch() performs the actual file-switching. The
yyFlexLexer class provides a series of member functions that can be used
for file switching purposes. The file-switching capability of a
yyFlexLexer object is founded on the struct yy_buffer_state,
containing the state of the scan-buffer of the file that is currently scanned
by the lexical scanner. This buffer is pushed on the state stack when an
#include is encountered. Then this buffer is replaced by the buffer that
will be created for the file that is mentioned in the #include directive.
The actual
file switching is realized as follows:
include-nesting is inspected. If
maxDepth is reached, the stack is considered full, and the scanner throws
a nestingTooDeep exception.
fileName vector is inspected, to avoid
circular inclusions. If nextSource is encountered in the
fileName vector, the inclusion is refused, and the scanner throws a
circularInclusion exception. The member function
throwOnCircularInclusion() is called for this task.
ifstream object is created, for the filename
in nextSource. If this fails, the scanner throws a cantRead exception.
yy_buffer_state is created for the newly opened
stream, and the lexical scanner is instructed to switch to that
stream using yyFlexLexer's member function
yy_switch_to_buffer().
performSwitch() and
throwOnCircularInclusion() are:
#include "scanner.ih"
void Scanner::performSwitch()
{
if (state.size() == maxDepth)
throw nestingTooDeep;
fileName.push_back(nextSource);
throwOnCircularInclusion();
ifstream *newStream = new ifstream(nextSource.c_str());
if (!*newStream)
throw cantRead;
state.push(yy_current_buffer);
yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer));
}
#include "scanner.ih"
void Scanner::throwOnCircularInclusion()
{
vector<string>::iterator
it = find(fileName.begin(), fileName.end(), nextSource);
if (it != fileName.end())
throw circularInclusion;
}
The member function popSource() is called to pop the previously pushed
buffer from the stack, to continue its scan just beyond the just processed
#include directive. The popSource() function first inspects
the size of the state stack: if empty, false is returned and the
fuction terminates. if not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the yyFlexLexer members
yy_delete_buffer() and
yy_switch_to_buffer. Note that yy_delete_buffer() takes care of the
closing of the ifstream and of deleting the memory allocated for this
stream in performSwitch(). Furthermore, the filename that was last entered
in the fileName vector is removed. Having done all this, the function
returns true:
#include "scanner.ih"
bool Scanner::popSource()
{
if (state.empty())
return false;
yy_delete_buffer(yy_current_buffer);
yy_switch_to_buffer(state.top());
state.pop();
fileName.pop_back();
return true;
}
These functions complete the implementation of the complete lexical
scanner. the lexical scanner itself is stored in the Scanner::yylex()
function. The Scanner object itself only has three public member
functions and a constructor. One member function is yylex(), the other two
allow the called to dump a stack trace and to obtain the name of the file that
was last processed by the scanner. The constructor is a simple piece of
code. Here is its source:
#include "scanner.h"
Scanner::Scanner(istream *yyin, string const &initialName)
{
switch_streams(yyin, yyout);
fileName.push_back(initialName);
}
The main program is very simple. As the program expects a filename to start
the scanning process at, initially the number of arguments is checked. If at
least one argument was given, then a ifstream object is created. If this
object can be created, then a Scanner object is constructed, receiving the
address of the ifstream object and the name of the file as its
arguments. Then the yylex() member function of the Scanner object is
called. This function is inherited from the Scanner's base class
yyFlexLexer. If anything fails inside the scanner, an exception is thrown,
which is caught at the end of the main() function:
/* lexer.cc
A C++ main()-frame generated by C++ for lexer.cc
*/
#include "lexer.h" /* program header file */
int main(int argc, char **argv)
{
if (argc == 1)
{
cerr << "Filename argument required\n";
exit (1);
}
ifstream yyin(argv[1]);
if (!yyin)
{
cerr << "Can't read " << argv[1] << endl;
exit(1);
}
Scanner scanner(&yyin, argv[1]);
try
{
return scanner.yylex();
}
catch (Scanner::Error err)
{
char const *msg[] =
{
"Include specification",
"Circular Include",
"Nesting",
"Read",
};
cerr << msg[err] << " error in " << scanner.lastFile() <<
", line " << scanner.lineno() << endl;
scanner.stackTrace();
return 1;
}
}
19.7.1.4: Building the scanner-program
The final program is constructed in two steps. These steps are given
for a
Unix system, on which
flex++ and the
Gnu C++
compiler
g++ have been installed:
flex++. For
this the following command can be given:
flex++ lexer
libfl.a library.
The appropriate command here is:
g++ -o scanner *.cc -lfl
NOTE: Don't forget to link against libfl.a!
flex++ is called with the
-d flag, debugging code will be part of the
generated scanner. Apart from that, the debugging code must be
activated. Assuming the scanner object is called scanner, the statement
scanner.set_debug(1); must be given after the construction of the scanner object.
Starting point for a program that uses both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer).
Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by member functions, which are called from within the parser's rules or are associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex++. In the current section we concern ourselves with the parser. The
parser can be generated from a grammar specification to be processed by the
program
bison++. The grammar specification required for bison++ is
similar to the specifications required for
bison, but bison++ will
generate a class, rather than a single function.
In the next sections we'll develop a program converting
infix expressions, in which binary operators are written between
their operands, to
postfix expressions, in which binary operators are
written behind their operands. Comparably, the
unary operator - will be converted from its prefix notation to a postfix
form (We ignore the unary + operator here for the sake of brevity).
Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section is, after all, to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.
In the upcoming sections we'll develop the grammar specification for
bison++. Then, the regular expressions for the scanner
are specified according to the requirements of flex++. Finally the program
is constructed.
The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.
19.7.2.1: The bison++ specification file
The
grammar specification file used for
bison++ is comparable to the
specification file used for
bison. Differences are related to the class
nature of the resulting parser. Our calculator will distinguish real numbers
from integers, and will support the basic set of arithmetic operators.
The bison++ specification file consists of the following sections:
bison. The
difference being the
%header{...} opening. In this section we'll encounter
mainly
declarations: header files are included, among which is
the scanner's header file.
bison++ has several extra items that can be declared
here. They are important and are discussed in a section of their own.
bison program.
19.7.2.2: The bison++ token section
The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:
%name ParserName. The name ParserName will be
the name of the parser's class. This entry should be the first entry of the
token section. It is used in cases where
multiple grammars are used, to
make sure that the different parser-classes use unique identifiers. By
default the name
parse is used.
%define name content. The %define has the same
function as the #define statement for the C++ preprocessor. It can be
used to define, e.g., a
macro. Internally, the defined symbol will be the
concatenation of
YY_, the parser's class name, and the name of the
macro. E.g.,
YY_ParserName_name
Several symbols will normally be defined here. Of all the definitions that
can be given here, the
following two are
required:
%define LEX_BODY { inline-code }:
here the body of the call to the
lexer is defined. It can be defined as
= 0 for an abstract parser-class, but otherwise it must contain the code
(including surrounding curly braces) representing the call to the lexer. For
example, if the lexer object generated by flex++ is called lexer, this
declaration should be
%define LEX_BODY { return lexer.yylex(); }
%define ERROR_BODY { inline-code }:
similarly, the body of the code of the call to the error-function is defined
here. It can be defined as = 0, in which case the parser's class will
again become abstract. Otherwise, it is used to specify the inner workings of
the error function, including surrounding braces. E.g.,
%define ERROR_BODY { cerr << "syntax Error\n"; }
When the
LEX_BODY and
ERROR_BODY definitions are omitted, then the
compiler is not able to construct the
virtual table of the parser class,
and the linker will report an error like
undefined reference to `Parser virtual table'
The remaining symbols are optional, and can be defined when needed:
%define DEBUG 1: if non-0 debugging
code will be included in the parser's source. However, in order to make the
debugging code active, the member variable yydebug should be set to
1. This can be done in, e.g., the constructor code or the constructor member
initialization section (see below) (Apparently, %define DEBUG 1
isn't required. Just setting yydebug = 1 appears to be all that is needed
to activate the parser's debugging code).
%define ERROR_VERBOSE: if
defined, the parser's stack will be dumped when an error occurs.
%define LVAL yylval: the default
variable name is shown here: the variable name containing the parser's
semantic value is by
default
yylval, but its name may be redefined
here.
%define INHERIT : public ClassA,
public ClassB: the
inheritance list for the parser's class. Note that it
starts with a
colon. The space character between INHERIT
and the colon may be omitted. The %define itself should be omitted if the
parser class is not derived from another class.
%define MEMBERS member-prototypes: if
the parser
contains extra members, they must be declared
here. Note that there is only one %define MEMBERS definition allowed. So,
if multiple members are to be declared, they must all be declared at this
point. To prevent
very long lines in
the specification file, the \ (
backslash) can be used at the end of a line,
to indicate that it continues on the next line of the source-text. E.g.,
%define MEMBERS void lookup(); \
void lookdown();
The MEMBERS section starts as a
public section. If
private members are required too, a private:
directive can be
used in the MEMBERS section.
%defines can be used:
%define
CONSTRUCTOR_PARAM parameterlist: this defines the parameterlist for the
parser's constructor. Here the types and names of the parameters of the
parser's constructor should be given. The surrounding
parentheses of the
parameter list are not part of the CONSTRUCTOR_PARAM definition.
%define CONSTRUCTOR_INIT
: initializer(s): this defines the
base class and
member initializers of the constructor. Note the
initial
colon following CONSTRUCTOR_INIT, which is required. The colon
may be given immediately after the CONSTRUCOR_INIT statement, or blanks
may be used to separate the symbol from the colon.
%define CONSTRUCTOR_CODE
{ code }: this defines the code of the parser's constructor (including
surrounding curly braces).
parse parser;
%union. This starts the definition of the
semantical value union. It replaces the
#define YYSTYPE definition seen
with bison. An example of a %union declaration is
%union
{
int i;
double d;
};
The union
cannot contain objects as its fields,
as
constructors
cannot be called when a union is created. This means that
a string cannot be a member of the
union. A string *, however, is a possible union member. As a side
line: the
lexical scanner does not have to know about this union. The
scanner can simply pass its scanned text to the parser through its
YYText()
member function. For example, a
statement like:
$$.i = atoi(scanner.YYText()); can be used to convert matched text to a value of an appropriate type.
$$ or the
generic return values $1, $2, etc, that are associated with components of
rules can be used, rather than $$.i, $3.d, etc. To associate a
non-terminal or a token with a union field, the
<fieldname> specification is
used. E.g.,
%token <i> INT // token association (deprecated, see below)
<d> DOUBLE
%type <i> intExpr // non-terminal association
In this example, note that both the tokens and the non-terminals can be
associated with a field of the union. However, as noted earlier, the lexical
scanner does not have to know about all this. In our opinion, it is cleaner to
let the scanner do just one thing: scan texts. The parser knows what the input
is all about, and may convert strings like "123" to an integer
value. Consequently, we are discouraging the association of a union field and
a
token. In the upcoming
description of the rules of the grammar this will be illustrated further.
%union discussion the
%token and
%type specifications should be noted. They are used for
the specficiation of the tokens (
terminal symbols) that can be returned by
the lexical scanner, and for the specification of the return types of
non-terminals. Apart from %token the
token indicators
%left,
%right and
%nonassoc may be used to specify the
associativity of operators. The tokens
mentioned at these indicators are interpreted as tokens indicating operators,
associating in the indicated direction. The
precedence of operators is given
by their order: the first specification has the lowest priority. To overrule a
certain precedence in a certain context,
%prec can be
used. As all this is standard bison practice, it isn't further discussed
in this context. The documentation provided with the bison distribution
should be consulted for further reference.
19.7.2.3: The bison++ grammar rules
The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.
lines:
lines
line
|
line
;
line:
intExpr
'\n'
{
cerr << "int: " << $1 << endl;
}
|
doubleExpr
'\n'
{
cerr << "double: " << $1 << endl;
}
|
'\n'
{
cout << "Good bye\n";
YYACCEPT;
}
|
error
'\n'
;
intExpr:
intExpr '*' intExpr
{
$$ = $1 * $3;
}
|
intExpr '+' intExpr
{
$$ = $1 + $3;
}
|
'(' intExpr ')'
{
$$ = $2;
}
|
'-' intExpr %prec UnaryMinus
{
$$ = -$2;
}
|
INT
{
$$ = atoi(lexer.YYText());
}
;
doubleExpr:
doubleExpr '*' doubleExpr
{
$$ = $1 * $3;
}
|
doubleExpr '+' doubleExpr
{
$$ = $1 + $3;
}
|
doubleExpr '*' intExpr
{
$$ = $1 * $3;
}
|
doubleExpr '+' intExpr
{
$$ = $1 + $3;
}
|
intExpr '*' doubleExpr
{
$$ = $1 * $3;
}
|
intExpr '+' doubleExpr
{
$$ = $1 + $3;
}
|
'(' doubleExpr ')'
{
$$ = $2;
}
|
'-' doubleExpr %prec UnaryMinus
{
$$ = -$2;
}
|
DOUBLE
{
$$ = atof(lexer.YYText());
}
;
Using these rules a simple
calculator is defined in which integer and
real values can be negated, added, and multiplied, and in which standard
priority rules can be circumvented using parentheses. The rules show the use
of typed nonterminal symbols: doubleExpr is linked to real (double)
values, intExpr is linked to integer values. Precedence and type
association is defined in the token section of the parser specification file,
which is:
%name Parser
%union
{
int i;
double d;
};
%token INT
DOUBLE
%type <i> intExpr
<d> doubleExpr
%left '+'
%left '*'
%right UnaryMinus
%define MEMBERS \
virtual ~Parser() {} \
private: \
Scanner lexer;
%define LEX_BODY {return lexer.yylex();}
%define ERROR_BODY { cerr << "error encountered\n"; }
In the token section we see the use of the %type specifiers,
connecting intExpr to the i-field of the semantic-value union, and
connecting doubleExpr to the d-field. At first sight it looks
complex, since the expression rules must be included for each individual
return type. On the other hand, if the union itself would have been used, we
would have had to specify somewhere in the returned semantic values what field
to use: less rules, but more complex and error-prone code.
Also, note
that the lexical scanner is
included as a member of the parser. Although there's really no need here to
define the scanner as a separate Scanner class, but doing so shows the
general approach we're advocating here. In the current situation, the
definition of the scanner class is extremely simple:
#include <FlexLexer.h> // provides yyFlexLexer interface
class Scanner: public yyFlexLexer
{
};
In general, the scanner can well be defined as a (private) member of the
parser, as it's not used outside of the parser object. The parser's virtual
destructor is included as an member to prevent the compiler from complaining
about the parser having a non-virtual destructor. E.g.,
warning: `class Parser' has virtual functions but non-virtual destructor
19.7.2.4: The flex++ specification file
The flex-specification file that is used by our calculator is
simple: blanks are skipped, single characters are returned, and
numerical values are returned as either Parser::INT or
Parser::DOUBLE tokens. Here is the complete flex++
specification file:
%{
#include "parser.h"
%}
%option yylineno
%%
[ \t] ;
[0-9]+ return(Parser::INT);
"."[0-9]* |
[0-9]+("."[0-9]*)? return(Parser::DOUBLE);
exit |
quit return (Parser::DONE);
.|\n return (*yytext);
19.7.2.5: The generation of the code
The code
is generated in the
same way as with bison and flex. To order bison++ to generate the
files parser.cc and parser.h, give the command:
bison++ -d -o parser.cc parser
Next,
flex++ will generate code on
lexer.cc using the command
flex++ -I -olexer.cc lexer
Note here that flex++ expects no blanks between the -o flag
and lexer.cc.
On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:
g++ -o calc -Wall *.cc -lfl -s
Note the fact that the program is linked against the
libfl.a library.
If it's not provided, the
linker will complain about
unresolved references like
yywrap().
A source in which the main() function and the parser object (having
the lexical scanner as one of its data members) is, finally:
#include "parser.h"
int main()
{
Parser parser;
cout << "Enter (nested) expressions containing ints, doubles, *, + and "
"unary -\n"
"operators. Enter an empty line, exit or quit to exit.\n";
return parser.yyparse();
}