We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.Please state the document version you're referring to, as found in the title (in this document: 5.2.0a) and please state the paragraph you're referring to.
All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.
This chapter presents a number of concrete examples of programming in
C++. Items from this document such as virtual functions, static
members, etc. are rediscussed. Examples of container classes are shown.
Also, with the advent of the
ANSI/ISO standard, classes supporting streams
based on
file descriptors are no longer available,
including the
Gnu procbuf
extension. These classes were frequently used
in older C++ programs. This section of the C++ Annotations develops an
alternative: classes extending streambuf
, allowing the use of file
descriptors, and classes around the
fork()
system call.
Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.
streambuf
as the
starting point for the construction of classes interfacing file
descriptors. In this section we will discuss the construction of a class which
may be used to write to a device defined by its file descriptor: it may be a
file, but could be a
pipe or
socket. Section 19.1.2 will discuss
the matter of reading from devices that are known by their file descriptors,
while section 19.3 will reconsider the question of redirection,
which was discussed earlier in section 5.8.3.
Basically, deriving a class for
output operations is very simple. The only
member function that must be overridden is the
virtual
int overflow(int
c)
member. This member is responsible for writing
characters to the device in the case where there is no buffer or no more room
in the buffer. If fd
represents the file descriptor to write to, and if we
decide against using a buffer, the member overflow()
can be as simple as:
class UnbufferedFD: public std::streambuf { public: int overflow(int c) { if (c != EOF) { if (write(fd, &static_cast<char>(c), 1) != 1) return EOF; } return c; } ... }The reveived argument is either written as
char
to the file
descriptor, or EOF
is returned. However, this approach does not use an
output buffer. As the use of a buffer is strongly advised (see also the next
section), we will discuss the construction of a class using an output buffer
is somewhat greater detail.
When an output buffer is to be used, the overflow()
member gets a bit more
complex, due to the fact that it is now called only when the buffer is
full. So, first we will have to flush the buffer, for which the (virtual)
function
streambuf::sync()
should be used. Defining sync()
and using
it in overflow()
is not all that must be done: the information to write
might be less than the size of the buffer. So, at the end of the
lifetime
of our special streambuf
object, it might find itself having a buffer that
is only partially full. So, we must make sure that the buffer is flushed by
the time our object goes
out of scope. This is of course very simple:
sync()
should be called in the
destructor as well.
Now that we've considered the consequences of using an output buffer,
we're almost ready to construct our derived class. We add a couple of extra
features. First, we permit the specification of the size of the output buffer
by the user of the class. Second, we permit the construction of an object of
the class before the file descriptor to use is actually known. Later, in
section 19.4 we'll encounter a situation where this feature will be used.
In order to save some space, the successful operation of various calls were
not checked. In `real life' implementations these checks should of course not
be omitted. Here is the class ofdnstreambuf
:
#ifndef _OFDNSTREAMBUF_H_ #define _OFDNSTREAMBUF_H_ #include <unistd.h> #include <streambuf> #include <ios> class ofdnstreambuf: public std::streambuf { unsigned d_bufsize; int d_fd; char *d_buffer; public: ofdnstreambuf() : d_bufsize(0), d_buffer(0) {} ofdnstreambuf(int fd, unsigned bufsize = 1) { open(fd, bufsize); } ~ofdnstreambuf() { if (d_buffer) { sync(); // 1 delete d_buffer; } } void open(int fd, unsigned bufsize = 1) { d_fd = fd; d_bufsize = bufsize == 0 ? 1 : bufsize; d_buffer = new char[d_bufsize]; setp(d_buffer, d_buffer + d_bufsize); // 2 } int sync() { if (pptr() > pbase()) { write(d_fd, d_buffer, pptr() - pbase()); // 3 setp(d_buffer, d_buffer + d_bufsize); // 4 } return 0; } int overflow(int c) { sync(); // 5 if (c != EOF) { *pptr() = static_cast<char>(c); // 6 pbump(1); } return c; } }; #endif
sync()
is
called to write any unprocessed characters in the output buffer to
the device.
open()
member, the buffer is initialized.
Using
setp()
the begin and end points
of the buffer are passed to the streambuf
base class, allowing it
to setup
pbase()
pptr()
and
epptr()
.
sync()
, the
unflushed characters in the buffer are written to the device. Then,
at the next line, the buffer is reinitialized. Note that sync()
should return 0 after a successfull flush operation.
streambuf
are
reset to their initial values by called setp()
once again.
overflow()
,
sync()
is called to flush the now filled up output buffer to the
device. As this recreates an empty buffer, the character c
which
could not be written to the buffer is now entered into the buffer
using the member functions pptr()
and
pbump()
. Notice that entering a character into the buffer is
realized using available member functions of streambuf
.
ofdstreambuf
class to copy its standard input to file descriptor
STDOUT_FILENO
, which is the
symbolic name of the file descriptor used
for the standard output. Here is the program:
#include "fdout.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { ofdnstreambuf fds(STDOUT_FILENO, 500); ostream os(&fds); switch (argc) { case 1: os << "COPYING cin LINE BY LINE\n"; for (string s; getline(cin, s); ) os << s << endl; break; case 2: os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n"; cin >> os.rdbuf(); // Alternatively, use: cin >> &fds; break; case 3: os << "COPYING cin BY INSERTING cin.rdbuf() into os\n"; os << cin.rdbuf(); break; } }
streambuf
, the class should use an input buffer of at least one character,
to allow the use of the member functions
istream::putback()
or
istream::ungetc()
. Stream classes (like
istream
) normally allow the
ungetting of at least one character using their member functions putback()
or ungetc()
. This is important, as these stream classes usually interface
to
streambuf
objects. Although strictly speaking it is not necessary to
implement a buffer in classes derived from streambuf
, we strongly suggest
to use buffers is these cases: the implementation is very simple and
straightforward, and the applicability of such classes is greatly
improved. Therefore, in all our classes
derived from the class streambuf
at least a
buffer of one character will be defined.
19.1.2.1: Using a one-character buffer
When deriving a class (e.g.,
ifdstreambuf
) from streambuf
using a
buffer of at least one character, at the minimum the member
streambuf::underflow()
should be overridden, as this is the member to
which all requests for input eventually degenerate. Since also a buffer is
needed, the member
streambuf::setg()
is used to inform the streambuf
baseclass of the dimensions of the input buffer, so that it is able to set up
its input buffer pointers so that
eback()
,
gptr()
, and
egptr()
return correct values. Here is the
corresponding class definition:
#include <unistd.h> #include <streambuf> #include <ios> class ifdstreambuf: public std::streambuf { protected: // 1 int d_fd; char d_buffer[1]; public: ifdstreambuf(int fd) : d_fd(fd) { setg(d_buffer, d_buffer + 1, d_buffer + 1); // 2 } int underflow() { if (gptr() < egptr()) // 3 return *gptr(); // 4 if (read(d_fd, d_buffer, 1) <= 0) // 5 return EOF; setg(d_buffer, d_buffer, d_buffer + 1); // 6 return *gptr(); } };Please note the following:
protected
data members so that derived classes (e.g., see section
19.1.2.3) can access them.
private
section a small
array of one character is defined, and setg()
will set gptr()
equal to
egptr()
. Since this implies that the buffer is empty, underflow()
will
be called to refill the buffer.
underflow()
will first see whether the buffer
is really empty. If not, then the character to be retrieved next is returned.
EOF
is returned. More sophisticated implementations could
react more intelligently here, of course.
setg()
is called once
again to set up streambuf
's buffer pointers correctly.
ifdstreambuf
class is used in the following program:
#include "ifdbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdstreambuf fds(0); istream is(&fds); cout << is.rdbuf(); }
19.1.2.2: Using an n-character buffer
How complex would things get if we decide to use a buffer of substantial
size? Not that complex. The following class allows use to specify the size of
a buffer, and it is basically the same class as ifdstreambuf
from the
previous section. To make things a bit more interesting, in the current class
ifdnstreambuf
the member
streambuf::xsgetn()
was also overridden, to
optimize the reading of series of characters at once. Also, a default
constructor is provided which can be used in combination with the added
open()
member in cases where we want to construct a istream
object
before we know the file descriptor that must be used. Later, in section
19.4 we'll encounter such a situation. In order to save some space, the
successful operation of various calls were not checked. In `real life'
implementations these checks should of course not be omitted. Here is the new
class ifdnstreambuf
:
#ifndef _IFDNBUF_H_ #define _IFDNBUF_H_ #include <unistd.h> #include <streambuf> #include <ios> class ifdnstreambuf: public std::streambuf { protected: int d_fd; unsigned d_bufsize; char* d_buffer; public: ifdnstreambuf() : d_bufsize(0), d_buffer(0) {} ifdnstreambuf(int fd, unsigned bufsize = 1) { open(fd, bufsize); } ~ifdnstreambuf() { if (d_bufsize) { close(d_fd); delete d_buffer; // 1 } } void open(int fd, unsigned bufsize = 1) { d_fd = fd; d_bufsize = bufsize; d_buffer = new char[d_bufsize]; setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize); } int underflow() { if (gptr() < egptr()) return *gptr(); int nread = read(d_fd, d_buffer, d_bufsize); if (nread <= 0) return EOF; setg(d_buffer, d_buffer, d_buffer + nread); // 2 return *gptr(); } std::streamsize xsgetn(char *dest, std::streamsize n) // 3 { int nread = 0; while (n) { if (!in_avail()) { if (underflow() == EOF) // 4 break; } int avail = in_avail(); if (avail > n) avail = n; memcpy(dest + nread, gptr(), avail); // 5 gbump(avail); // 6 nread += avail; n -= avail; } return nread; } }; #endifWith this class, please note:
ifdnstreambuf
object goed
out of scope.
underflow()
. The
input buffer pointers of streambuf
are now reset in accord with the actual
number of characters read from the device.
xsgetn()
is overridden. In a loop
n
is reduced until 0, at which point the function
terminates. Alternatively (at 4), the member returns if underflow()
fails to obtain more characters. At 5 and 6 reading is
optimized: instead of calling
streambuf::sbumpc()
n
times, a block
of avail
characters is copied to the destination, using
streambuf::gpumb()
to consume avail
characters from the buffer using
one function call.
xsgetn()
is called by
streambuf::sgetn()
,
which is a streambuf
member. The following example illustrates the use of
this member function with a ifdnstreambuf
object:
#include "ifdnbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdnstreambuf fds(0, 30); // internally: 30 char buffer char buf[80]; // main() reads blocks of 80 // chars while (true) { unsigned n = fds.sgetn(buf, 80); if (n == 0) break; cout.write(buf, n); } }
19.1.2.3: Seeking positions in `streambuf' objects
When devices support seek operations, classes derived from
streambuf
should override
streambuf::seekoff()
and
streambuf::seekpos()
. In the
following class
ifdseek
, which is derived from our previously constructed
class
ifdstreambuf
, we assume that the device of which we have a
file descriptor available suports these seeking operations. The class
ifdseek
is derived from ifdstreambuf
, so it uses a character buffer of
just one character. The facilities to perform seek operations, which are added
to our new class ifdseek
will make sure that the input buffer is reset
when a seek operation is requested. The class could also be derived from the
class ifdnstreambuf
, in which case the arguments to reset the input
buffer must be adapted such that its second and third parameter point beyond
the available input buffer. Here is the class definition:
#include "ifdbuf.h" #include <unistd.h> class ifdseek: public ifdstreambuf { typedef std::streambuf::pos_type pos_type; // 1 typedef std::streambuf::off_type off_type; typedef std::ios::seekdir seekdir; typedef std::ios::openmode openmode; // 2 public: ifdseek(int fd) // 3 : ifdstreambuf(fd) {} pos_type seekoff(off_type offset, seekdir dir, openmode) // 4 { pos_type pos = lseek ( d_fd, offset, (dir == std::ios::beg) ? SEEK_SET : (dir == std::ios::cur) ? SEEK_CUR : SEEK_END ); if (pos < 0) return -1; setg(d_buffer, d_buffer + 1, d_buffer + 1); // 5 return pos; } pos_type seekpos(pos_type offset, openmode mode) // 6 { return seekoff(offset, std::ios::beg, mode); } };
streambuf
and
ios
classes were redefined to prevent us from specifying std::streambuf
and
std::ios
again and again when reffering to these types.
lseek()
is used to seek a new position
in a file whose
file descriptor is known.
streambuf
class will
reset its input buffer pointers to an empty buffer, so that underflow()
will refill the buffer at the next request for input.
seekpos
is overridden as well:
it is actually defined by a call to seekoff()
.
ifdseek
is the following. If
this program is given its own source file using input
redirection then
seeking is supported, and apart from the first line, every other line is shown
twice:
#include "fdinseek.h" #include <string> #include <iostream> #include <istream> #include <iomanip.h> using namespace std; int main(int argc) { ifdseek fds(0); istream is(&fds); string s; while (true) { if (!getline(is, s)) break; streampos pos = is.tellg(); cout << setw(5) << pos << ": `" << s << "'\n"; if (!getline(is, s)) break; streampos pos2 = is.tellg(); cout << setw(5) << pos2 << ": `" << s << "'\n"; if (!is.seekg(pos)) { cout << "Seek failed\n"; break; } } }
buffer is refilled
19.1.2.4: Multiple `unget()' calls in `streambuf' objects
As stated earlier
streambuf
classes and classes derived from streambuf
should at least support ungetting the character that was last read. Special
care must be taken when series of unget()
calls are to be supported. In
this section we will show how to construct a class which is derived from
streambuf
and which allows a configurable number of
istream::unget()
or
istream::putback()
calls. Support for multiple (say `n
')
unget()
calls is realized by setting aside an initial section of the input
buffer, which is then gradually filled up by the last n
characters read.
Here is an implementation of a class supporting series of unget()
calls:
#include <unistd.h> #include <streambuf> #include <ios> #include <algorithm> class fdunget: public std::streambuf { int d_fd; unsigned d_bufsize; unsigned d_reserved; char* d_buffer; char* d_base; public: fdunget (int fd, unsigned bufsz, unsigned unget) // 1 : d_fd(fd), d_reserved(unget) // 2 { unsigned allocate = // 3 bufsz > d_reserved ? bufsz : d_reserved + 1; d_buffer = new char [allocate]; // 4 d_base = d_buffer + d_reserved; // 5 setg(d_base, d_base, d_base); // 6 d_bufsize = allocate - d_reserved; // 7 } ~fdunget() { delete d_buffer; } int underflow() { if (gptr() < egptr()) return *gptr(); unsigned consumed = // 8 static_cast<unsigned>(gptr() - eback()); unsigned move = std::min(consumed, d_reserved); // 9 memcpy(d_base - move, egptr() - move, move); // 10 int nread = read(d_fd, d_base, d_bufsize); // 11 if (nread <= 0) // none read -> return EOF; setg(d_base - move, d_base, d_base + nread); // 12 return *gptr(); } };
unget
determines the size of a reserved area,
which is the first d_reserved
bytes of the used buffer.
reserved
. So, a certain number of bytes will
be read, and once reserved
bytes have been read at least reserved
bytes can be unget.
d_base
, which is located reserved
bytes into the d_buffer
area. This is always the point where refills of the buffer start.
streambuf
's buffer pointers are set up. As no
characters have been read as yet, all pointers are set to d_base
. At the
next read request,
underflow()
will now know
that no characters are available, so unget()
will fail immediately.
underflow()
. Up to this
point everything is standard, and now we determine how many characters were
consumed. Remember that at 6 we ensured that the pointers indicated
0 bytes in the buffer, so consumed
is either 0 or the actual number of
bytes read into the buffer after a successful refill.
reserved
, but it is less if fewer
characters were available. So, move
is set to the minimum of the number of
consumed characters and reserved
.
d_base
.
d_base
, not from d_buffer
.
streambuf
's read buffer pointers are set up.
Eback()
is set to move
locations before
d_base
, thus defining the guaranteed unget-area,
gptr()
is set to d_base
, since that's the
location of the first read character after a refill, and
egptr()
is set just beyond the location of
the last read character.
class
fdunget
. The program reads at most 10 characters from the standard input,
stopping at EOF
. A guaranteed unget-buffer of 2 characters is defined,
in a buffer of 3 characters. Just before reading a character it is tried to
unget at most 6 characters. This is of course never possible, but the program
will neatly unget as many characters as possible, considering the actual
number of characters read:
#include "fdunget.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { fdunget fds(0, 3, 2); istream is(&fds); char c; for (int idx = 0; idx < 10; ++idx) { cout << "after reading " << idx << " characters:\n"; for (int ug = 0; ug <= 6; ++ug) { if (!is.unget()) { cout << "\tunget failed at attempt " << (ug + 1) << "\n" << "\trereading: '"; is.clear(); while (ug--) { is.get(c); cout << c; } cout << "'\n"; break; } } if (!is.get(c)) { cout << " reached\n"; break; } cout << "Next character: " << c << endl; } } /* Generated output after 'echo abcde | program': after reading 0 characters: unget failed at attempt 1 rereading: '' Next character: a after reading 1 characters: unget failed at attempt 2 rereading: 'a' Next character: b after reading 2 characters: unget failed at attempt 3 rereading: 'ab' Next character: c after reading 3 characters: unget failed at attempt 4 rereading: 'abc' Next character: d after reading 4 characters: unget failed at attempt 4 rereading: 'bcd' Next character: e after reading 5 characters: unget failed at attempt 4 rereading: 'cde' Next character: after reading 6 characters: unget failed at attempt 4 rereading: 'de ' reached */
form()
(and
scan()
) for stream objects. In this section the
construction of a class
oformstream
is described, which is derived from
ostream
and does support form()
and
vform()
. Analogously to oformstream
a class
iscanstream
can be
developed, supporting scan()
and
vscan()
. The contruction of this
latter class is left as an
exercise to the reader.
Here is the class interface and definition. It is annotated below:
#include <iostream> class oformstream: public std::ostream // 1 { public: oformstream(std::ostream &ostr) // 2 : std::ostream(ostr.rdbuf()) {} std::ostream &form(char const *fmt, ...); // 3 };
ostream
, so it
inherits ostream
's capabilities.
std::ostream
object.
form()
is declared. Its definition will be given
shortly.
A program using the class oformstream
is given in the next
example. It prints a well-known string and some marker text:
#include "oformstream.h" #include <memory> #include <stdarg.h> std::ostream &oformstream::form(char const *fmt, ...) // 1 { va_list list; va_start(list, fmt); unsigned n = vsnprintf(0, 0, fmt, list); // 2 std::auto_ptr<char> buf(new char[n + 1]); // 3 vsprintf(buf.get(), fmt, list); // 4 return *this << buf.get(); // 5 } using namespace std; int main() { oformstream of(cout); of << "Hello world\n"; cout << "Ok, "; of << "That's all, folks\n"; of.form("%s\n", "Hello world of C++") << endl; }; /* Generated output: Hello world Ok, That's all, folks Hello world of C++ */
In the above example:
form()
is defined.
vsnprintf()
is given a buffer size of 0 and
0
as the pointer to the buffer. According to the
ISO/IEC 9899:1999
standard, the function vsnprintf()
will return the number of required
characters for the formatted string (not including the final
ASCII-Z
value) (see the vsnprintf()
man page for details and possible alternative
implementations of vsnprintf()
).
auto_ptr
is used to manage the dynamically
allocated buffer.
ostream
base class, and the oformstream
object is
returned.
ios::rdbuf()
member
function. By assigning the streambuf
of a stream to another stream, both
stream objects access the same streambuf
, thus realizing redirection at
the level of the programming language itself.
It should be realized that this is fine within the context of the C++
program, but if that context is left, the redirection is terminated, as the
operating system does not know about streambuf
objects. This happens,
e.g., when a program uses a
system()
call to start a subprogram. The
following program uses C++ redirection to redirect the information
inserted into
cout
to a file, and then calls
system("echo hello world")to echo a well-known line of text. Since
echo
writes its information
to the standard output this would be the program's redirected file if
C++'s redirection would be recognized by the operating system. However,
this is not the case, so hello world
still appears at the program's
standard output and not on the redirected file. A solution of this problem
involves redirection at the operating system level, for which system calls
like
dup()
and
dup2()
are available in some operating systems (e.g.,
Unix and friends). Examples of the use of these system calls are given in
section 19.4 in this chapter.
Here is the example of the failing redirection at the system level
following C++ redirection using streambuf
redirection:
#include <iostream> #include <fstream> #include <cstdlib> using namespace::std; int main() { ofstream of("outfile"); cout.rdbuf(of.rdbuf()); cout << "To the of stream" << endl; system("echo hello world"); cout << "To the of stream" << endl; } /* Generated output: on the file `outfile' To the of stream To the of stream On standard output: hello world */
fork()
system call is well
known. When a program needs to start a new process,
system()
can be used,
but this requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to call fork()
.
In this section we will see how C++ can be used to wrap classes around a
complex system call like fork()
. Much of what follows in this section
directly applies to the
Unix
operating system, and the discussion will
therefore focus on that operating system. However, other systems usually
provide comparable facilities. The following discussion is based heavily on
the notion of
design patterns, as published by Gamma et al. (1995)
When fork()
is called, the current program is duplicated in memory,
thus creating a new process, and both processes continue their execution just
below the fork()
system call. The two processes may, however, inspect the
return value of fork()
: the return value in the original process (called
the
parent process) differs from the return value
in the newly created process (called the
child process):
fork()
returns the
process ID of
the child process that was created by the fork()
system call. This is a
positive integer value.
fork()
returns 0.
fork()
fails, -1 is returned.
A basic Fork
class should hide the bookkeeping details of a system
call like fork()
from its users. The Fork
class developed below will
do just that. The class itself only needs to take care of the proper execution
of the fork()
system call. Normally, fork()
is called to start a child
process, usually boiling down to the execution of a separate process. This
child process may expect input at its standard input stream and/or may
generate output to its standard output and/or standard error streams. Fork
does not know all this, and does not have to know what the child process is.
However, a Fork
object should be able to activate the child
process.
Unfortunately, it is not known to the constructor of the Fork
class
what actions the child process should perform. Similarly, it is not known
what actions the parent process should perform. For this particular situation,
the
template method design pattern
was developed. According to Gamma c.s., the template method
design pattern
``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''
This design pattern allows us to define a
base class in which the
essential steps of the fork()
system call have already been implemented,
deferring the implementation of parts of the normal use of the fork()
system call to subclasses.
In particular, the Fork
base class will implement:
fork()
member function, which will perform the actual forking
(i.e., it will create the (new) child process);
virtual
destructor ~Fork()
, which may be
overridden by derived classes;
getPid()
function, allowing derived classes to
access the system fork()
's return value.
int waitForChild()
, which member can be called by a parent process to
wait for the completion of its child process (as discussed below).
Then, the following essential steps of the normal use of fork()
are left
to be implemented by derived classes:
virtual void childProcess()
: the member defining the actions to be
taken by the childprocess;
virtual void parentProcess()
: the member defining the actions to be
taken by the parentprocess;
virtual void childRedirections()
: this member should be implemented
if any of the standard streams (cin, cout
) or cerr
in the
childprocess must be redirected;
virtual void parentRedirections()
: this member should be implemented
if any of the standard streams (cin, cout
) or cerr
in the
parentprocess must be redirected;
The header file implementing the Fork
base class can now be
constructed. It is:
class Fork { int d_pid; public: virtual ~Fork() {} void fork(); protected: int getPid() { return d_pid; } virtual void childRedirections() // do child redirections {} virtual void childProcess() = 0; // must be implemented virtual void parentRedirections() // do parent redirections {} virtual void parentProcess() = 0; // must be implemented int waitForChild(); // returns the status };The implementation of the member function
fork()
provides for the
calling of the system function fork()
(Caution: since the system function
fork()
is called in the member function having the same name, the ::
scope resolution operator must be used to prevent a recursive call of the
member function itself), whereafter, depending on the return value of
::fork()
, either the parentProcess()
or the childProcess()
is
called. Redirection of the standard streams may be necessary. The redirection
are to be defined in childRedirections()
and parentRedirections()
:
#include "fork.ih" void Fork::fork() { if ((d_pid = ::fork()) < 0) throw "Fork::fork() failed"; if (d_pid == 0) // childprocess has pid == 0 { childRedirections(); childProcess(); exit (1); // we shouldn't come here: // childProcess should exit } parentRedirections(); parentProcess(); }In
fork.cc
the class'
internal header file fork.ih
is
included. This header file takes care of the inclusion of the necessary system
header files, as well as the inclusion of fork.h
itself. Its
implementation (omitting the
redundant include guards) is:
#include "fork.h" #include <cstdlib> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h>
Normally, the child process does not return: once it has completed its task it
should terminate as a separate process. The parent process usually waits for
its child process to complete. The terminating child process informs
its parent that it is about to terminate by sending a
signal to the
parent, which should be caught by the parent. If the child terminates and the
parent process does not catch the signal generated by the child process then
the child process remains visible as a so-called
zombi process. If the
parent has to wait for the child, the member waitForChild()
can be called
in the implementation of the function parentProcess()
to wait for the
child's termination signal. Its implementation is:
#include "fork.ih" int Fork::waitForChild() { int status; waitpid(d_pid, &status, 0); return WEXITSTATUS(status); }This simple implementation basically returns the child's exit status to the parent. The called system function
waitpid()
blocks until the
child terminates.
There is, however, a situation where the child process continues to
live, but the parent dies. In nature this happens all the time: parents
tend to die before their children do. In our context (i.e. C++), this
represents a
daemon program, where the parent process dies and the child
program actually continues as a child of the basic
init
process. Again,
when the child eventually dies a signal is sent to its `
step-parent'
init
. No zombi is created here, as init
catches the termination
signals of all its
(step-) children.
The construction of a daemon process is so simple, that no special Fork
object is needed (see section 19.4.1 for an example).
fork()
is to start a
child process. The parent process terminates immediately after spawning the
child process. If this happens, the child process continues to run as a child
process of
init
, the always running first process on
Unix systems. Such
a process is often called a
daemon, running as a
background process.
Although the following example can easily be constructed as a plain C
program, it was included in the C++ Annotations because it is so closely
related to the current discussion of the Fork
class. I thought about
adding a daemon()
member to that class, but eventually decided against it
because the construction of a daemon program is very simple and uses none of
the features currently offered by the Fork
class.
Here is an example illustrating the construction of a daemon program:
#include <iostream> #include <unistd.h> #include "fork.h" class Daemon: public Fork { public: virtual void parentProcess() // the parent does nothing. {} virtual void childProcess() { sleep(3); // actions taken by the child // just a message... std::cout << "Hello from the child process\n"; exit (0); // The child process should } // exit here. }; int main() { Daemon daemon; daemon.fork(); // program immediately returns return 0; } /* Generated output: The next command prompt, then after 3 seconds: Hello from the child process */
pipe()
system call. One of the file descriptors is used for writing, the
other file descriptor is used for reading.
fork()
function is called),
which duplicates the file descriptors: both the child process and the parent
process now have their own copies of the two filedescriptors.
Pipe
class that is
constructed here. The interface of the class is:
class Pipe { private: enum RW { READ, WRITE }; int d_fd[2]; void redirect(int d_fd, int alternateFd); public: Pipe(); int useForReading(); void useForReadingFrom(int fileDescriptor); int useForWriting(); void useForWritingTo(int fileDescriptor); void useForWritingTo(int const *fileDescriptors, unsigned n = 2); };The constructor constructs a set of associated file descriptors using the
pipe()
system call. The class has two sets of member functions:
useForReading()
and useForReadingFrom()
can be used to setup
the reading side of the pipe. The former function is used to setup
redirection, so that a specific file descriptor (usually
STDIN_FILENO
)
may be used for reading;
the latter is used to setup the reading side of the pipe, and returns a file
descriptor that can be used to read from the pipe.
useForWriting()
and useForWritingTo()
can be used to setup
the writing side of the pipe. The former function comes in two overloaded
versions:
useForWritingTo(int fileDescriptor)
is used to setup single
redirection, so that a specific file descriptor (usually
STDOUT_FILENO
or
STDERR_FILENO
) may be used to write to;
(useForWritingTo(int *fileDescriptor, unsigned n = 2))
may be used
to setup multiple redirection by providing an array argument containing
the file descriptors whose information is redirected to the pipe.
Pipe
were implemented as follows:
#include "pipe.h" #include <unistd.h> Pipe::Pipe() { if (pipe(d_fd)) throw "Pipe::Pipe(): pipe() failed"; } void Pipe::redirect(int d_fd, int alternateFd) { if (dup2(d_fd, alternateFd) < 0) throw "Pipe: redirection failed"; } int Pipe::useForReading() { close(d_fd[WRITE]); return d_fd[READ]; } void Pipe::useForReadingFrom(int fd) { useForReading(); redirect(d_fd[READ], fd); close(d_fd[READ]); } int Pipe::useForWriting() { close(d_fd[READ]); return d_fd[WRITE]; } void Pipe::useForWritingTo(int const *fd, unsigned n) { useForWriting(); for (int idx = 0; idx < n; idx++) redirect(d_fd[WRITE], fd[idx]); close(d_fd[WRITE]); } void Pipe::useForWritingTo(int fd) { useForWritingTo(&fd, 1); }
Pipe
class and its derivatives
IPipe
and OPipe
are constructed here. Objects of these classes may be
used to construct sets of associated file descriptors.
Pipe
defines and initializes the file descriptor array. Here is the
definition of the class Pipe
:
class Pipe { private: enum RW { READ, WRITE }; int d_fd[2]; void redirect(int d_fd, int alternateFd); public: Pipe(); int useForReading(); void useForReadingFrom(int fileDescriptor); int useForWriting(); void useForWritingTo(int fileDescriptor); void useForWritingTo(int const *fileDescriptors, unsigned n = 2); };
Fork
and Pipe
, it's time for an application.
The class
ParentSlurp
, derived from Fork
, is a class which starts a
child process which execs a program (like /bin/ls
). The (standard)
output of the program is read by the parent, who will (for demonstration
purposes) write the received lines to its standard output stream, while
prepending linenumbers to the lines. It is most convenient here to redirect
the parents standard input stream, so that the parent can read the output from
the child process from its std::cin
input stream.
In this case, the following members must be implementated:
ParentSlurp
, initializing the pipe;
childRedirections()
member, defining its pipe as a pipe for
reading and redirecting its STDIN_FILENO
to read from its side of
the pipe.
parentRedirections()
member, defining its pipe as a pipe for
writing and redirecting its STDOUT_FILENO
to write to its side of
the pipe.
childProcess()
member, exec-ing a program (which will
write information to its standard output)
parentProcess()
member, slurping the child's output from its
standard input, to be processed further.
ParentSlurp
class:
#include "fork.h" #include "pipe.h" #include <unistd.h> class ParentSlurp: public Fork { Pipe d_pipe; public: ParentSlurp() {} protected: virtual void childRedirections() { d_pipe.useForWritingTo(STDOUT_FILENO); } virtual void parentRedirections() { d_pipe.useForReadingFrom(STDIN_FILENO); } virtual void childProcess() { execl("/bin/ls", "/bin/ls", 0); } virtual void parentProcess(); };And here is the implementation. The
parentProcess()
member repeatedly
reads lines from its standard input, until all lines have been read. Then it
waits for the child to complete. The main()
function simply constructs a
ParentSlurp
object, and calls its fork()
member. Output should consist
of a numbered list of files in the directory where the program is started. An
example is provided with the implementation given below. Note that the program
also needs the fork.o, pipe.o
and waitforchild.o
object files (see the
earlier sources):
#include "parentslurp.h" #include <string> #include <iostream> void ParentSlurp::parentProcess() { std::string line; unsigned int nr = 1; while (getline(std::cin, line)) std::cout << nr++ << ": " << line << std::endl; waitForChild(); } int main() { ParentSlurp ps; ps.fork(); return 0; } /* Generated Output (example only, actually obtained output may differ: 1: a.out 2: bitand.h 3: bitfunctional 4: bitnot.h 5: daemon.cc 6: fdinseek.cc 7: fdinseek.h ... */
start
will start a new child process. The parent will return the
ID (a number) to the user. The ID may thereupon used to send a message to the
child;
<nr> text
will send ``text
'' to child process with ID <nr>
;
stop <nr>
will terminate child process having ID <nr>
;
exit
will terminate the parent as well as all its children.
A problem with programs like these is that they allow input from multiple
sources: input may appear from the standard input as well as from
the read-side of the communication pipes. Also, multiple output channels are
used. To handle situations like this, the
select()
system call was
developed.
The
select()
system call was developed to handle synchronous I/O
multiplexing. This system call can be used to handle, e.g., input appearing
synchroneously at a set of file descriptors.
Using the select()
system call is rather complex, and its full
discussion is beyond the scope of the C++ Annotations.
However, its use may be simplified by providing a class Selector
,
which hides away the details of using select()
. The interface of such a
class is:
#ifndef _INCLUDED_SELECTOR_H_ #define _INCLUDED_SELECTOR_H_ #include <sys/time.h> #include <sys/types.h> #include <unistd.h> #include <limits.h> class Selector { fd_set d_read; fd_set d_write; fd_set d_except; fd_set d_ret_read; fd_set d_ret_write; fd_set d_ret_except; timeval d_alarm; int d_max; int d_ret; int d_readidx; int d_writeidx; int d_exceptidx; public: Selector(); int wait(); int nReady() { return d_ret; } int getReadFd() { return checkSet(&d_readidx, d_ret_read); } int getWriteFd() { return checkSet(&d_writeidx, d_ret_write); } int getExceptFd() { return checkSet(&d_exceptidx, d_ret_except); } void setAlarm(int sec, int usec = 0) { d_alarm.tv_sec = sec; d_alarm.tv_usec = usec; } void noAlarm() { setAlarm(INT_MAX, INT_MAX); } void addReadFd(int fd) { addFd(&d_read, fd); } void addWriteFd(int fd) { addFd(&d_write, fd); } void addExceptFd(int fd) { addFd(&d_except, fd); } void rmReadFd(int fd) { FD_CLR(fd, &d_read); } void rmWriteFd(int fd) { FD_CLR(fd, &d_write); } void rmExceptFd(int fd) { FD_CLR(fd, &d_except); } private: int checkSet(int *index, fd_set &set); void addFd(fd_set *set, int fd); }; #endifThe class has a rather extensive interface, but that's mainly the consequence of
select()
being a complex function. It's members are:
Selector()
: the constructor. It requires no
arguments.
int wait()
: this function will block() until
activity is sensed at any of the file descriptors that are monitored by
the Selector
object, or if the alarm times out. It will throw an
exception when select()
, called by wait()
, fails.
int nReady
: its return value is defined only
when wait()
has returned. In that case its return value is 0 for a
alarm-timeout, -1 if select()
failed, and the number of file descriptors
on which activity was sensed otherwise.
int getReadFd()
: its return value is defined
only when wait()
has returned. In that case its return value is -1 if no
more input file descriptors are available. Otherwise the next available file
descriptor for reading is returned.
int getWriteFd()
: analogously to
getReadFd()
, but it relates to the set of output-related filedescriptors.
int getWriteFd()
: analogously to
getReadFd()
, but it relates to the set of exception-related
filedescriptors.
void setAlarm(int sec, int usec = 0)
: this
member sets the alarm-time. At least the number of seconds to wait for the
alarm to go off is required.
void noAlarm()
: this member switches off the
alarm.
void addReadFd(int fd)
: this member adds
a file descriptor to the set of input file descriptors monitored by the
Selector
object.
void addWriteFd(int fd)
: this member adds a
file descriptor to the set of output file descriptors monitored by the
Selector
object.
void addExceptFd(int fd)
: this member adds
a file descriptor to the set of exception file descriptors to be monitored by
the Selector
object.
void rmReadFd(int fd)
: this member removes a
file descriptor from the set of input file descriptors monitored by the
Selector
object.
void rmWriteFd(int fd)
: this member removes a
file descriptor from the set of output file descriptors monitored by the
Selector
object.
void rmExceptFd(int fd)
: this member removes
a file descriptor from the set of exception file descriptors to be monitored
by the Selector
object.
Selector()
and wait()
were implemented as follows:
#include "selector.h" Selector::Selector() { FD_ZERO(&d_read); FD_ZERO(&d_write); FD_ZERO(&d_except); noAlarm(); d_max = 0; } int Selector::wait() { timeval t = d_alarm; d_ret_read = d_read; d_ret_write = d_write; d_ret_except = d_except; d_readidx = 0; d_writeidx = 0; d_exceptidx = 0; d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, &t); if (d_ret < 0) throw "Selector::wait()/select() failed"; return d_ret; } void Selector::addFd(fd_set *set, int fd) { FD_SET(fd, set); if (fd >= d_max) d_max = fd + 1; } int Selector::checkSet(int *index, fd_set &set) { int &idx = *index; while (idx < d_max && !FD_ISSET(idx, &set)) ++idx; return idx == d_max ? -1 : idx++; }
The child-processes started by the monitor are defined by the class
Child
. The Child
class is derived from the class Fork
, so
childProcess()
and parentProcess()
members must be defined. However,
in this case the parent process is empty, as all communications with the child
process take place in the Monitor
object (see section 19.4.4.3).
The Child
object will set up redirections in such a way that it can read
its standard input and output in the child process itself, and will make the
opposites ends of the corresponding pipes known to the parent process via two
auxiliary functions, getReadFd()
and getWriteFd()
.
Note that the reading end of a pipe in the child process corresponds to the
writing end of a pipe in the paren process (and vice versa). The Child
object is responsible for the bookkeeping that's involved here: the user of
the Child
process shouldn't have to juggle with the opposite ends of
pipes, but should be given the appropriate ends, including their
functions. Hence, getReadFd()
, which is used by the parent process, will
produce the file descriptor from which the parent may read information sent to
it by its child.
Apart from this noteworthy characteristic, Child
's interface doesn't
contain special features. Here it is:
#include "pipe.h" #include "fork.h" #ifndef _INCLUDED_SELECTOR_H_ #include "selector.h" #endif class Child: public Fork { Pipe d_in; Pipe d_out; Selector d_selector; int d_parentReadFd; int d_parentWriteFd; int d_nr; public: Child(int nr) : d_nr(nr) {} int getReadFd() { return d_parentReadFd; } int getWriteFd() { return d_parentWriteFd; } int getPid() { return Fork::getPid(); } int getNr() { return d_nr; } virtual void childRedirections(); virtual void parentRedirections(); virtual void childProcess(); virtual void parentProcess() {} };
The implementation of the member functions is given below. Please pay some
attention to the way the ends of the two pipes are used: in
childRedirections()
the redirection to the standard streams takes place,
in parentRedirections()
the d_out
pipe's reading file descriptor is
assigned to d_parentReadFd
, which is then returned by getReadFd()
. The
analogous situation holds true for the d_in
pipe.
Next, the childProcess()
will add the STDIN_FILENO
to the
d_selector
object's set of read file descriptors. Since redirection has
taken place, the child process can simply concentrate on the redirected file
descriptor.
Then, in a perpetual loop the child process waits for the Selector
object
to return from its wait()
call. When the alarm goes off, it sends a
message to its standard output (hence, into the writing pipe) that it's still
standing by, otherwise it will echo the messages appearing at its standard
input to its standard output.
Here is the implementation of the Child
members:
#include "child.h" #include <iostream> #include <string> using namespace std; void Child::childRedirections() { d_in.useForReadingFrom(STDIN_FILENO); d_out.useForWritingTo(STDOUT_FILENO); } void Child::parentRedirections() { d_parentReadFd = d_out.useForReading(); d_parentWriteFd = d_in.useForWriting(); } void Child::childProcess() { Selector selector; unsigned message = 0; selector.addReadFd(STDIN_FILENO); selector.setAlarm(5); while (true) { try { if (!selector.wait()) // timeout cout << "Child " << d_nr << ": standing by\n"; else { string line; getline(cin, line); cout << "Child " << d_nr << ":" << ++message << ": " << line << endl; } } catch (...) { cout << "Child " << d_nr << ":" << ++message << ": " << "select() failed" << endl; } } exit(0); }
The Monitor
object does all the work. The public interface of the
class Monitor
Its public interface is rather simple, as it just contains
the function to start the whole process. Here is the interface of the
class Monitor
. Note that it contains a
nested class Find
. This
nested class is discussed below, at the description of the
stopChild()
member function:
#include <string> #include <map> #ifndef _INCLUDED_SELECTOR_H_ #include "selector.h" #endif #include "child.h" class Monitor { enum Commands { UNKNOWN, START, EXIT, STOP, TEXT }; Selector d_selector; int d_nr; std::map<int, Child *> d_child; public: Monitor() : d_nr(0) {} void run(); private: static void waitForChild(int signum); static void killChild(std::map<int, Child *>::value_type it); int next(int &value, std::string &line); void processInput(); void processChild(int fd); void exiting(); void stopChild(int value); void sendChild(int value, std::string const &line); void createNewChild(); class Find { int d_nr; public: Find(int nr) : d_nr(nr) {} bool operator()(std::map<int, Child *>::value_type &vt) const { return d_nr == vt.second->getNr(); } }; };Apart from the constructor, there is just one public member:
run()
. This member does the actual child-handling, as follows:
waitForChild()
will be installed to
do just that.
Monitor
object will listen only to its standard
input: the set of input file descriptors to which select()
listens is
initialized to STDIN_FILENO
.
Selector
's wait()
function is called.
If input on cin
is available, it is processed by processInput()
,
otherwise, the input will have arived from a child process, and it is
processed by processChild()
.
run()
was implemented as follows:
#include "monitor.ih" void Monitor::run() { signal(SIGCHLD, waitForChild); d_selector.addReadFd(STDIN_FILENO); while (true) { cout << "? " << flush; try { d_selector.wait(); int fd; while ((fd = d_selector.getReadFd()) != -1) { if (fd == STDIN_FILENO) processInput(); else processChild(fd); } } catch (...) { cerr << "select failed, exiting\n"; exiting(); } } }
The member processInput()
reads the commands from the user via the
standard input. The member itself is rather simple: it calls next()
to
obtain the next command entered by the user, and then calls the corresponding
function. This member and its helper member are not complicated. The sources,
which were kept rather basic, will hardly need further elaboration at this
point in the Annotations:
#include "monitor.ih" void Monitor::processInput() { string line; int value; switch (next(value, line)) { case START: createNewChild(); break; case EXIT: exiting(); break; case STOP: stopChild(value); break; case TEXT: sendChild(value, line); break; default: cout << "unknown: " << line << "\n"; break; } } int Monitor::next(int &value, string &line) { if (!getline(cin, line)) throw "Command::next(): reading cin failed"; if (line == "start") return START; if (line == "exit") return EXIT; if (line.find("stop") == 0) { istringstream istr(line.substr(4)); istr >> value; return !istr ? UNKNOWN : STOP; } istringstream istr(line.c_str()); istr >> value; if (istr) { getline(istr, line); return TEXT; } return UNKNOWN; }
All remaining input sensed by select()
originated in a child
process. The processChild()
member is called with the appropriate file
descriptor. Then, using a
ifdstreambuf
(see section 19.1.2.1), the
input appearing at the file descriptor is read from an input stream. The
communication protocol that is used here is also rather basic: To every
line of input sent to a child, the child sends exactly one line
of text in return. Consequently, processChild()
can just read the line of
text, and return:
#include "monitor.ih" void Monitor::processChild(int fd) { ifdstreambuf ifdbuf(fd); istream istr(&ifdbuf); string line; getline(istr, line); cout << d_child[fd]->getPid() << ": " << line << endl; }Please note the construction
d_child[fd]->getPid()
that is used in
the above source. Looking at the interface of Monitor
, a map<int, Child
*>
is observed. This map contains the input file descriptor as its key,
and a pointer to the Child
object as its value. A pointer is used here,
rather than a Child
object, since we do want to use the facilities offered
by the map, but don't want to copy a Child
object.
The responsibilities to destroy the Child
object once it becomes
superfluous is now, of course, with the programmer, and not any more with the
run-time support system.
A new element is added to the map by the member createNewChild()
,
which is called in response of the user entering the start
command. Here
is its definition:
#include "monitor.ih" void Monitor::processChild(int fd) { ifdstreambuf ifdbuf(fd); istream istr(&ifdbuf); string line; getline(istr, line); cout << d_child[fd]->getPid() << ": " << line << endl; }Here the new processes are started. Once the
Child
object is created,
its fork()
member is called. This will (as fork()
uses the template
method design pattern, see section 19.4) call the child- and parent-
related functions in, respectively, the child- and parent-processes. As the
child process never returns (see section 19.4.4.2), and
Child::parentProcess()
is an empty function, the code below the
cp->fork()
call is only executed in the parent process, which is the
main program. This code simply adds the file descriptor that's available for
reading information from the child to the set of input file descriptors
monitored by select()
, and stores the pointer to the Child
object in
the d_child
map.
Direct communication with the child is required as the result of the
stop <nr>
and <nr> text
commands, which will, respectively, terminate
child process <nr>
and send the text text
to child process <nr>
.
In both cases, the program has to figure out which process belongs to a
certain child number. Here the
find_if()
generic algorithm (see section
17.4.14) was used to look for the requested child-number in the
d_child
map. The definition of the class Find
could be part of the
internal header file monitor.ih
, or, alternatively, it could be defined as
a nested private class in Monitor
's class interface. Here the former
approach was followed: Find
is defined as a nested class of Monitor
.
The stopChild()
member looks for the requested child process number
and, if found, removes its input file descriptor from the set monitored by the
select()
system call. Then the child process is terminated by calling
killChild()
, whereafter the Child
object is erased from the
map. Note that this erasure does not destroy the Child
object, so before
erasing the map-element, the Child
object is deleted first:
#include "monitor.ih" void Monitor::stopChild(int nr) { map<int, Child *>::iterator it = find_if(d_child.begin(), d_child.end(), Find(nr)); if (it != d_child.end()) { d_selector.rmReadFd(it->second->getReadFd()); killChild(*it); delete it->second; d_child.erase(it); } else cerr << "No child number " << nr << endl; }
Comparably, the sendChild()
member sends some information to a
particular child process. The same lookup action is required here as with
stopChild()
: using a Monitor::Find
object the validity of the
requested child process number is obtained, whereafter a
ofdnstreambuf
object is constructed (see section 19.1.1) to be used with an
ostream
object, to send information to the child.
Three member functions remain to be discussed:
waitForChild()
is activated by
SIGCHLD
signals sent by
terminating children. The Monitor
object simply acknowledges the
termination by catching the signal, which is further ignored.
exiting()
terminates all child processes, by visiting all elements of
the d_child
map, using the
for_each()
generic algorithm (see section
17.4.17). Since the function exits the program, the destruction of the
Child
objects is superfluous, and is skipped.
killChild()
terminates a child process by sending it the
SIGTERM
signal.
#include "monitor.ih" void Monitor::waitForChild(int signum) { int status; wait(&status); signal(SIGCHLD, waitForChild); } void Monitor::killChild(map<int, Child *>::value_type it) { if (kill(it.second->getPid(), SIGTERM)) cerr << "Couldn't kill process " << it.second->getPid() << endl; } void Monitor::exiting() { for_each(d_child.begin(), d_child.end(), killChild); exit(0); }
Finally, the program's main()
function is simply:
#include "monitor.h" int main() { Monitor monitor; monitor.run(); } /* Example of a session: # a.out ? start Child 1 started ? 1 hello world ? 3394: Child 1:1: hello world ? 1 hi there! ? 3394: Child 1:2: hi there! ? start Child 2 started ? 3394: Child 1: standing by ? 3395: Child 2: standing by ? 3394: Child 1: standing by ? s3395: Child 2: standing by ? top 1 ? 3395: Child 2: standing by ? 2 hello world ? 3395: Child 2:1: hello world ? 1 hello world No child number 1 ? exit3395: Child 2: standing by ? # */
For whatever reason, some operators are missing: there are no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
template class implementing a
function object calling the
bitwise and (
operator&()
), and a template
class implementing a function object calling the
unary not
(
operator~()
). Other missing operators can be constructed similarly. This
is, however, left as an
exercise to the reader.
Here is the implementation of a function object calling the
bitwise
operator&()
:
#include <functional> template <typename _Tp> struct bit_and : public std::binary_function<_Tp,_Tp,_Tp> { _Tp operator()(const _Tp& __x, const _Tp& __y) const { return __x & __y; } };
Here is the implementation of a function object calling operator~()
:
#include <functional> template <typename _Tp> struct bit_not : std::public unary_function<_Tp,_Tp> { _Tp operator()(const _Tp& __x) const { return ~__x; } };
These and other
missing predefined function objects
are also implemented in the
file
bitfunctional
, which is found in one of the cplusplus
zip-archives.
An example using bit_and()
to remove all odd numbers from a vector of
int
values, using the
remove_if()
generic algorithm is:
#include <iostream> #include <algorithm> #include <vector> #include "bitand.h" using namespace std; int main() { vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy ( vi.begin(), remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)), ostream_iterator<int>(cout, " ") ); cout << endl; } /* Generated output: 0 2 4 6 8 */
Vector
, introduced in section
18.3 to contain a RandomAccessIterator reverse_iterator
.
First, we note that applying the ordinary iterators of the Vector
class
(made available by the begin()
and end()
member functions) to the
generic algorithms is without any problem, as the pointers that are returned
by these members are accepted by the generic algorithms. Here is a Vector
template class:
#include <algorithm> template <typename Type> class Vector { typedef Type *iterator; public: Vector() { init(0); }; Vector(unsigned n) { init(n); } Vector(Vector<Type> const &other) { construct(other); } ~Vector() { delete [] data; } Vector<Type> const &operator=(Vector<Type> const &other) { if (this != &other) { delete [] data; construct(other); } return *this; } Type &operator[](unsigned index) throw(char const *) { if (index >= (beyond - data)) throw "Vector array index out of bounds"; return data[index]; } void push_back(Type const &value) { if (!data) { init(1); beyond = data; } else if (beyond == end_of_storage) { Vector<Type> enlarged((end_of_storage - data) << 1); copy(data, beyond, enlarged.data); delete [] data; beyond = enlarged.data + (beyond - data); data = enlarged.data; end_of_storage = enlarged.end_of_storage; enlarged.data = 0; } *beyond++ = value; } iterator begin() { return data; } iterator end() { return beyond; } unsigned size() { return beyond - data; } private: void init(unsigned n) { if (n) { data = new Type[n]; beyond = data + n; end_of_storage = data + n; } else { data = 0; beyond = 0; end_of_storage = 0; } } void construct(Vector<Type> const &other) { init(other.beyond - other.data); copy(other.data, other.beyond, data); } Type *data, *beyond, *end_of_storage; };
And a simple program using Vector
:
#include <iostream> #include "vector.h" using namespace std; int main() { Vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); random_shuffle(vi.begin(), vi.end()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; } /* generated output, e.g., 1 7 9 3 5 6 0 4 8 2 */
However, the use of a reverse iterator is not without problems: the
reverse iterator shows rather complex behavior, in that it visits the elements
in its range backwards, resulting in a reversal of the meanings of the
operators that are used to step forward and step backward in a range. If a
reverse iterator rit
points to element 11 in a range, then it will point
to element 10, rather than 12 after ++rit
. Because of this complex
behavior, the reverse iterator will be implemented as a class
reverse_iterator
, which can do the required bookkeeping for us.
Furthermore, in the current context reverse_iterator
is a reverse
iterator only for the elements of the class Vector
. So, a
nested class seems to be appropriate here. Since Vector
has
already been defined, we will decide to leave it untouched.
Rather than modifying Vector
, a template class Vector2
is derived
of it, and Vector2
will be given its nested class reverse_iterator
.
This latter class can be derived from std::random_access_iterator
to
realize a
RandomAccessIterator for the class Vector2
:
template <typename Type> class Vector2: public Vector<Type> { public: class reverse_iterator: public std::random_access_iterator<Type, ptrdiff_t> { friend class Vector2<Type>; // see the text, below ... }; ... };The
ptrdiff_t
type that is mentioned here is ordinarilly defined as an
int
(e.g., with the
Linux
operating system it is defined as a
__kernel_ptrdiff_t
in
types.h
, which is defined as an int
in
posix_types.h
).
The reverse_iterator
class is now defined as a nested template class,
derived from the template class random_access_iterator
. Its surrounding
class is Vector2
, which itself is a template class that was derived from
the template class Vector
:
#include <iterator> #include "vector.h" template <typename Type> class Vector2: public Vector<Type> { public: Vector2() : Vector<Type>() {} Vector2(unsigned n) : Vector<Type>(n) {} class reverse_iterator: public std::random_access_iterator<Type, ptrdiff_t> { friend class Vector2<Type>; reverse_iterator(Type *begin, Type *end); Type *d_begin; Type *d_end; public: reverse_iterator(reverse_iterator const &other); // reverse_iterator &operator=(reverse_iterator const &other); Type &operator*() const; bool operator<(reverse_iterator const &other) const; bool operator==(reverse_iterator const &other) const; bool operator!=(reverse_iterator const &other) const { return !(*this == other); } reverse_iterator &operator++(); reverse_iterator operator++(int); reverse_iterator &operator--(); reverse_iterator operator--(int); int operator-(reverse_iterator const &other) const; reverse_iterator operator+(int step) const; reverse_iterator operator-(int step) const; }; reverse_iterator rbegin() { return reverse_iterator(begin(), end()); } reverse_iterator rend() { return reverse_iterator(begin(), begin()); } };
Within reverse_iterator
we store two Type
pointers, called
begin
and end
, which will be used to define the reversed range
[end - 1, begin - 1)
. In the
private section of reverse_iterator
we declare:
Type *d_begin; Type *d_end;The members
d_begin
and d_end
receive their initial values from
the begin()
and end()
members of the Vector
object.
However, the class reverse_iterator
is closely tied to Vector2
:
only Vector2
objects should be allowed to construct their own reverse
iterators. This policy is enforced by making Vector2<Type>
a
bound friend of reverse_iterator
, and by declaring its
constructor private
. The constructor expects the two Type
pointers,
defining the range [begin(), end())
. Internally begin() - 1
will be
used as endpoint of the reversed range, while end() - 1
will be used
as the beginning of the reversed range.
The constructor, defined in the private section of the
reverse_iterator
class, is therefore:
reverse_iterator(Type *begin, Type *end) : d_begin(begin), d_end(end) {}In the implementation of
reverse_iterator
, d_end
will gradually
walk towards d_begin
, which will keep its value. The dereferencing
operator will not return *d_begin
, but rather d_end[-1]
: the element
just before the element to which d_end
points. All this is allowed, as
Vector::begin()
and Vector::end()
return random access iterators
themselves, for which
pointer arithmetic operations are defined.
Since the reverse_iterator
class declares Vector2<Type>
as a
friend, the Vector2
class can implement two members, rbegin()
and
rend()
returning reverse_iterator
objects:
reverse_iterator rbegin() { return reverse_iterator(begin(), end()); } reverse_iterator rend() { return reverse_iterator(begin(), begin()); }Apart from these two members, the
Vector2
class could define two
constructors, analogous to the constructors in the Vector
class. E.g.,
Vector2() : Vector<Type>() {}
In order to allow programs to use these reverse iterators, a public
copy constructor and overloaded
assignment operator is needed in the
class reverse_iterator
.
Since the Type
pointers in reverse_iterator
are merely used for
pointing to existing information, a
destructor
is not required. Of course, this implies that the usefulness of a
reverse_iterator
terminates if the object from which it was obtained goed
out of scope. It is the
responsibility of the programmer to make sure this doesn't cause
disaster. In the public section of reverse_iterator
we add the following
members (implementations given as inline code):
reverse_iterator(reverse_iterator const &other) : d_begin(other.d_begin), d_end(other.d_end) {} reverse_iterator &operator=(reverse_iterator const &other) { d_begin(other.d_begin); d_end(other.d_end); }The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.
Random access iterators require us to implement a series of member
functions. Here they are, as they should appear in the
public section of
the reverse_iterator
nested class. Most implementations will be
self-explanatory, and consist of single lines of code:
#include "vector2.h" #define VectorRevIt Vector2<Type>::reverse_iterator template <typename Type> VectorRevIt::reverse_iterator(reverse_iterator const &other) : d_begin(other.d_begin), d_end(other.d_end) {} template <typename Type> VectorRevIt::reverse_iterator(Type *begin, Type *end) : d_begin(begin), d_end(end) {} template <typename Type> Type &VectorRevIt::operator*() const { return d_end[-1]; } template <typename Type> Vector2<Type>::reverse_iterator VectorRevIt::operator++(int) { return reverse_iterator(d_begin, d_end--); } template <typename Type> Vector2<Type>::reverse_iterator &VectorRevIt::operator++() { --d_end; return *this; } template <typename Type> Vector2<Type>::reverse_iterator &VectorRevIt::operator--() { ++d_end; return *this; } template <typename Type> Vector2<Type>::reverse_iterator VectorRevIt::operator--(int) { return reverse_iterator(d_begin, d_end++); } template <typename Type> bool VectorRevIt::operator<(reverse_iterator const &other) const { return d_end > other.d_end; } template <typename Type> bool VectorRevIt::operator==(reverse_iterator const &other) const { return other.d_end == d_end; } template <typename Type> int VectorRevIt::operator-(reverse_iterator const &other) const { return other.d_end - d_end; } template <typename Type> Vector2<Type>::reverse_iterator VectorRevIt::operator+(int step) const { return reverse_iterator(d_begin, d_end - step); } template <typename Type> Vector2<Type>::reverse_iterator VectorRevIt::operator-(int step) const { return operator+(-step); }The following member deserve special attention:
bool operator<(reverse_iterator const &other) const
This function compares two iterators. Theoperator<()
must be interpreted as `the lvalue iterator points to an element earlier in the range than the rvalue iterator'. Since we're talking reversed iterators here, this means that the lvalue operator points to a later element than the rvalue operator in the normal iterator range. So, the overloaded operator should return true ifthis->end > other.end
, implying that the current iterator points to an element that is located earlier in the range than the other iterator. So, we implement:return d_end > other.d_end;
reverse_iterator operator++(int)
The postfix increment operator must return areverse_iterator
pointing to the current element, incrementing the pointer. Again, this implies thatend
is decremented. Since we have a constructor expecting two (normal) iterators,, the implementation again is a one-liner:return reverse_iterator(d_begin, d_end--);Using the postfix decrement withend
, areverse_iterator
object is returned representing the iterator's position at the time the postfix increment operator was called, incrementing (so, decrementingend
) the reversed iterator as a side effect.
reverse_iterator operator++(int)
Implemented analogously:return reverse_iterator(d_begin, d_end++);
int operator-(reverse_iterator const &other) const
This is the pointer arithmetic operator: it returns the difference between two pointers, representing the number of steps to make to reach the lvalue operand starting from the rvalue operand. Normally we would implement this asd_end - other.d_end
. However, since reversed iterators reduce their values to reach elements later in the range, the algebraic sign must be negated with reversed iterators:-(d_end - other.d_end)
, or, using simle algebraic manipulations:return other.d_end - d_end;
reverse_iterator operator-(int step) const
This operator allows us to use random staps backward using a
random access iterator. With a reversed iterator this means that we must
increase the value of the iterator. So we return the following
reverse_iterator
object:
return reverse_iterator(d_begin, d_end + step);
reverse_iterator operator+(int step) const
This operator does the opposite. So we implement it accordingly: return operator+(-step);
template <typename Type> // this is a template function // the class in which the Vector2<Type>::reverse_iterator<Type>:: // constructor is defined reverse_iterator(Type *begin, Type *end)// the constructor itself : d_begin(begin), d_end(end) {} template <typename Type> // this is a template function Vector2<Type>::reverse_iterator // this is its return type // the class in which the funcion Vector2<Type>::reverse_iterator<Type>:: // is defined operator+(int step) const // this is the function itself { return reverse_iterator(d_begin, d_end - step); }Finally, an example of a program using the
reversed_iterator
:
#include <iostream> #include "vector3.h" using namespace std; int main() { Vector2<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; random_shuffle(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; sort(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; reverse(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " ")); cout << endl; } /* Generated output: 0 1 2 3 4 5 6 7 8 9 7 1 5 9 3 4 6 0 2 8 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 0 */
The current example assumes that the reader knows how to use the
scanner generator
flex
and the
parser generator
bison
. Both
bison
and flex
are well documented elsewhere. The original
predecessors of bison
and flex
, called
yacc
and
lex
are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
However, scanner- and parser generators are also (and maybe even more
commonly, nowadays) available as free software. Both bison
and flex
can be obtained from
ftp://prep.ai.mit.edu/pub/gnu. Flex
will create a C++
class when called as flex++
, or when the
-+
flag
is used.
For bison
the situation is a bit more complex. Scattered over the
Internet several bison++
archives can be found (e.g., in
ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or
rzbsdi01.uni-trier.de). The
information in these archives usually dates back to 1993, irrespective of the
version number mentioned with the archive itself.
Using flex++
and bison++
a class
-based scanner and parser can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become somewhat cleaner than without using the
class
interface.
Below two examples are given. In the first example only a lexical scanner is
used to monitor the production of a file from several parts. This example
focuses on the lexical scanner, and on switching files while churning through
the information. The second example uses both a scanner and a parser to
transform standard arithmetic expressions to their postfix notation, commonly
encountered in code generated by compilers and in HP
-calculators. The
second example focuses on the parser.
The setup is as follows: The
input-language knows of an
#include
statement, which is followed by a string indicating the file which should be
included at the location of the #include
.
In order to avoid complexities that have irrelevant to the current
example, the format of the #include
statement is restricted to the
form
#include <filepath>
.
The file specified between the pointed brackets should be available at
the location indicated by filepath
. If the file is not available,
the program terminates using a proper error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout
. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.
One small extra feature is that
comment-lines should be recognized:
include
directives in comment-lines should be ignored, comment being the
standard C++ comment-types.
The program is created in the following steps:
lexer
is constructed, containing the
specifications of the input-language.
lexer
the requirements for the
class Scanner
evolve. The Scanner
class is a
wrapper around the
class
yyFlexLexer
generated by
flex++
. The requirements results in the
specification of the
interface for the class Scanner
.
main()
function is constructed. A Startup
object
is created to inspect the
command-line arguments. If successful,
the scanner
's member yylex()
is called to construct the
output file.
19.7.1.1: The flex++ specification file
The organization of the
lexical scanner specification file is similar
to the one used with flex
. However, flex++
creates a class
(
yyFlexLexer
) from which the class Scanner
will be derived.
The code associated with the
regular expression rules will be located
inside the class yyFlexLexer
. However, it would be handy to access the
member functions of the derived class within that code.
Fortunately,
inheritance helps us to realize this. In the specification of
the class yyFlexLexer()
, we notice that the function
yylex()
is a
virtual function. In the FlexLexer.h
header file we see virtual
int yylex()
:
class yyFlexLexer: public FlexLexer { public: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ); virtual ~yyFlexLexer(); void yy_switch_to_buffer( struct yy_buffer_state* new_buffer ); struct yy_buffer_state* yy_create_buffer( istream* s, int size ); void yy_delete_buffer( struct yy_buffer_state* b ); void yyrestart( istream* s ); virtual int yylex(); virtual void switch_streams( istream* new_in, ostream* new_out ); };Consequently, if
yylex()
is defined in a derived class, then this
function of the derived class will be called from a
base class (i.e.,
yyFlexLexer
) pointer. Since the yylex()
function of the
derived class is called, that function will have access to the members
of its class, and to the public and protected members of its base class.
The context in which the generated scanner is placed is (by
default)
the function
yyFlexLexer::yylex()
. However, this context can be changed by
defining the
YY_DECL
-macro. This macro, if defined, determines the context
in which the generated scanner will be placed. So, in order to make the
generated scanner part of the derived class function yylex()
, three
things must be done:
YY_DECL
must be defined in the lexer specficiation
file. It must define the derived class function yylex()
as the scanner
function. For example:
#define YY_DECL int Scanner::yylex()
yylex()
must be declared in the class definition of
the derived class.
yyFlexLexer::yylex()
is a virtual function, it
must still be defined. It is not called, though, so its definition may be a
simple:
int yyFlexLexer::yylex() { return 0; }
YY_DECL
macro and the yyFlexLexer::yylex()
function can conveniently be placed in the lexer specification file, as shown
below.
Looking at the regular expressions themselves, notice that we'll need rules
for the recognition of the comment, for the
#include
directive, and for the remaining characters. This is all fairly
standard practice. When an #include
directive is detected, the directive
is parsed by the scanner. This too is
common practice. Here is what will
happen in the current setup:
Scanner::Error
value (invalidInclude
) if this
fails;
nextSource
;
#include
directive
switchSource()
is called to perform the switch to another file.
EOF
) is detected, the derived class'
member function popSource()
is called, which will pop the previously
pushed file, returning
true
.
false
,
resulting in the call of yyterminate()
, which will terminate the scanner.
The lexical scanner specification file has three sections:
%option yylineno
when the
lexical scanner should keep track of the
line numbers of the files it is
scanning;
istream
*yyin
to the
ostream
*yyout
, for which the predefined macro
ECHO
can be used. The complete and annotated lexical scanner
specification file to be processed by
flex++
is:
%{ /* ---------------------------------------------------------------------------- C++ -preamble. Include header files, other than those generated by flex++ and bison++. E.g., include the interface to the class derived from yyFlexLexer ----------------------------------------------------------------------------*/ // the yylex() function that's actually // used #define YY_DECL int Scanner::yylex() #include "scanner.ih" // The interface of the derived class int yyFlexLexer::yylex() // not called: overruled by { // Scanner::yylex() return 0; } %} /* ---------------------------------------------------------------------------- Flex++ symbol area ~~~~~~~~~~~~~~~~~~ The symbols mentioned here are used for defining e.g., a mini scanner ---------------------------------------------------------------------------- */ %x comment %x include %option yylineno eolnComment "//".* anyChar .|\n /* ---------------------------------------------------------------------------- Flex rules area: ~~~~~~~~~~~~~~~~ Regular expressions below here define what the lexer will recognize. ---------------------------------------------------------------------------- */ %% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ #include[ \t]+"<" BEGIN include; <include>[^ \t>]+ nextSource = yytext; <include>">"[ \t]*\n { BEGIN INITIAL; performSwitch(); } <include>{anyChar} throw invalidInclude; /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>>)rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!popSource()) yyterminate(); }Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of
yyFlexLexer
are
protected, and thus
accessible to derived classes), most processing can be done by the derived
class' member functions. This results in a very clean setup of the lexer
specification file, which requires hardly any code in the preamble.
19.7.1.2: The derived class: Scanner
The class Scanner
is derived from the class
yyFlexLexer
as generated
by
flex++
. The derived class has access to the data controlled by the
lexical scanner. In particular, the derived class has access to the following
data
members:
char *yytext
: contains the
text
matched by a
regular expression (accessible outside of the scanner from the
YYText()
member);
int yyleng
: the
length of
the text in yytext
(accessible outside of the scanner from the
YYLeng()
member);
int yylineno
: the current
line number (only if
%option yylineo
was specified in the
lexer specfication file, accessible outside of the scanner
from the
lineno()
member);
FlexLexer.h
, which is part of the
flex
distribution.
The class Scanner
performs two tasks:
EOF
is detected in a
file.
Scanner
class the final
header file is given. Note
that, apart from the private member functions, several private data members
are used as well. These members are initialized by the Scanner()
constructor and are used in the private member functions. They are
discussed below, in the context of the member functions that are using
them. Here is the scanner.h
header file:
#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream> #include <string> #include <stack> #include <vector> class Scanner: public yyFlexLexer { public: enum Error { invalidInclude, circularInclusion, nestingTooDeep, cantRead, }; Scanner(istream *yyin, string const &initialName); string const &lastFile() { return fileName.back(); } void stackTrace(); // dumps filename stack to cerr int yylex(); // overruling yyFlexLexer's yylex() private: Scanner(Scanner const &other); // no Scanner copy-initialization Scanner &operator=(Scanner const &other); // no assignment either bool popSource(); // true: source popped void performSwitch(); void throwOnCircularInclusion(); stack<yy_buffer_state *> state; vector<string> fileName; string nextSource; static unsigned const sizeof_buffer = 16384, maxDepth = 10; };The scanning process proceeds as follows: Once the scanner extracts a filename from an
#include
directive, a
switch to another file is performed by performSwitch()
. If the filename
could not be extracted, the scanner throws an invalidInclude
exception
value.
The performSwitch()
member and the matching function popSource()
handle the file switch. In particular, note that
yylineno
is not updated
when a
file switch is performed. If line numbers are to be monitored, then
the current value of yylineno
should be pushed on a stack, and
yylineno
should be reset by performSwitch()
, while popSource()
should reinstate a former value of yylineno
by popping a previously pushed
value from the stack.
In the current implementation of Scanner
a simple
stack of
yy_buffer_state
pointers is maintained. Changing that into a stack of
pair<yy_buffer_state *, unsigned>
elements allows you to save the line
numbers. This modification is left as an
exercise to the reader.
As mentioned. performSwitch()
performs the actual file-switching. The
yyFlexLexer
class provides a series of member functions that can be used
for file switching purposes. The file-switching capability of a
yyFlexLexer
object is founded on the struct yy_buffer_state
,
containing the state of the scan-buffer of the file that is currently scanned
by the lexical scanner. This buffer is pushed on the state
stack when an
#include
is encountered. Then this buffer is replaced by the buffer that
will be created for the file that is mentioned in the #include
directive.
The actual
file switching is realized as follows:
include
-nesting is inspected. If
maxDepth
is reached, the stack is considered full, and the scanner throws
a nestingTooDeep
exception.
fileName
vector is inspected, to avoid
circular inclusions. If nextSource
is encountered in the
fileName
vector, the inclusion is refused, and the scanner throws a
circularInclusion
exception. The member function
throwOnCircularInclusion()
is called for this task.
ifstream
object is created, for the filename
in nextSource
. If this fails, the scanner throws a cantRead
exception.
yy_buffer_state
is created for the newly opened
stream, and the lexical scanner is instructed to switch to that
stream using yyFlexLexer
's member function
yy_switch_to_buffer()
.
performSwitch()
and
throwOnCircularInclusion()
are:
#include "scanner.ih" void Scanner::performSwitch() { if (state.size() == maxDepth) throw nestingTooDeep; fileName.push_back(nextSource); throwOnCircularInclusion(); ifstream *newStream = new ifstream(nextSource.c_str()); if (!*newStream) throw cantRead; state.push(yy_current_buffer); yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }
#include "scanner.ih" void Scanner::throwOnCircularInclusion() { vector<string>::iterator it = find(fileName.begin(), fileName.end(), nextSource); if (it != fileName.end()) throw circularInclusion; }The member function
popSource()
is called to pop the previously pushed
buffer from the stack, to continue its scan just beyond the just processed
#include
directive. The popSource()
function first inspects
the size of the state
stack: if empty, false
is returned and the
fuction terminates. if not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the yyFlexLexer
members
yy_delete_buffer()
and
yy_switch_to_buffer
. Note that yy_delete_buffer()
takes care of the
closing of the ifstream
and of deleting the memory allocated for this
stream in performSwitch()
. Furthermore, the filename that was last entered
in the fileName
vector is removed. Having done all this, the function
returns true
:
#include "scanner.ih" bool Scanner::popSource() { if (state.empty()) return false; yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state.top()); state.pop(); fileName.pop_back(); return true; }These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the
Scanner::yylex()
function. The Scanner
object itself only has three public member
functions and a constructor. One member function is yylex()
, the other two
allow the called to dump a stack trace and to obtain the name of the file that
was last processed by the scanner. The constructor is a simple piece of
code. Here is its source:
#include "scanner.h" Scanner::Scanner(istream *yyin, string const &initialName) { switch_streams(yyin, yyout); fileName.push_back(initialName); }
The main program is very simple. As the program expects a filename to start
the scanning process at, initially the number of arguments is checked. If at
least one argument was given, then a ifstream
object is created. If this
object can be created, then a Scanner
object is constructed, receiving the
address of the ifstream
object and the name of the file as its
arguments. Then the yylex()
member function of the Scanner
object is
called. This function is inherited from the Scanner
's base class
yyFlexLexer
. If anything fails inside the scanner, an exception is thrown,
which is caught at the end of the main()
function:
/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin, argv[1]); try { return scanner.yylex(); } catch (Scanner::Error err) { char const *msg[] = { "Include specification", "Circular Include", "Nesting", "Read", }; cerr << msg[err] << " error in " << scanner.lastFile() << ", line " << scanner.lineno() << endl; scanner.stackTrace(); return 1; } }
19.7.1.4: Building the scanner-program
The final program is constructed in two steps. These steps are given
for a
Unix system, on which
flex++
and the
Gnu C++
compiler
g++
have been installed:
flex++
. For
this the following command can be given:
flex++ lexer
libfl.a
library.
The appropriate command here is:
g++ -o scanner *.cc -lflNOTE: Don't forget to link against libfl.a!
flex++
is called with the
-d
flag, debugging code will be part of the
generated scanner. Apart from that, the debugging code must be
activated. Assuming the scanner object is called scanner
, the statement
scanner.set_debug(1);
must be given after the construction of the scanner
object.
Starting point for a program that uses both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer).
Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by member functions, which are called from within the parser's rules or are associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex++
. In the current section we concern ourselves with the parser. The
parser can be generated from a grammar specification to be processed by the
program
bison++
. The grammar specification required for bison++
is
similar to the specifications required for
bison
, but bison++
will
generate a class, rather than a single function.
In the next sections we'll develop a program converting
infix expressions, in which binary operators are written between
their operands, to
postfix expressions, in which binary operators are
written behind their operands. Comparably, the
unary operator -
will be converted from its prefix notation to a postfix
form (We ignore the unary +
operator here for the sake of brevity).
Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section is, after all, to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.
In the upcoming sections we'll develop the grammar specification for
bison++
. Then, the regular expressions for the scanner
are specified according to the requirements of flex++
. Finally the program
is constructed.
The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.
19.7.2.1: The bison++ specification file
The
grammar specification file used for
bison++
is comparable to the
specification file used for
bison
. Differences are related to the class
nature of the resulting parser. Our calculator will distinguish real numbers
from integers, and will support the basic set of arithmetic operators.
The bison++
specification file consists of the following sections:
bison
. The
difference being the
%header{...}
opening. In this section we'll encounter
mainly
declarations: header files are included, among which is
the scanner's header file.
bison++
has several extra items that can be declared
here. They are important and are discussed in a section of their own.
bison
program.
19.7.2.2: The bison++ token section
The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:
%name ParserName
. The name ParserName
will be
the name of the parser's class. This entry should be the first entry of the
token section. It is used in cases where
multiple grammars are used, to
make sure that the different parser-classes use unique identifiers. By
default the name
parse
is used.
%define name content
. The %define
has the same
function as the #define
statement for the C++
preprocessor. It can be
used to define, e.g., a
macro. Internally, the defined symbol will be the
concatenation of
YY_
, the parser's class name, and the name of the
macro. E.g.,
YY_ParserName_nameSeveral symbols will normally be defined here. Of all the definitions that can be given here, the following two are required:
%define LEX_BODY { inline-code }
:
here the body of the call to the
lexer is defined. It can be defined as
= 0
for an abstract parser-class, but otherwise it must contain the code
(including surrounding curly braces) representing the call to the lexer. For
example, if the lexer object generated by flex++
is called lexer
, this
declaration should be
%define LEX_BODY { return lexer.yylex(); }
%define ERROR_BODY { inline-code }
:
similarly, the body of the code of the call to the error-function is defined
here. It can be defined as = 0
, in which case the parser's class will
again become abstract. Otherwise, it is used to specify the inner workings of
the error function, including surrounding braces. E.g.,
%define ERROR_BODY { cerr << "syntax Error\n"; }
When the
LEX_BODY
and
ERROR_BODY
definitions are omitted, then the
compiler is not able to construct the
virtual table of the parser class,
and the linker will report an error like
undefined reference to `Parser virtual table'
The remaining symbols are optional, and can be defined when needed:
%define DEBUG 1
: if non-0 debugging
code will be included in the parser's source. However, in order to make the
debugging code active, the member variable yydebug
should be set to
1. This can be done in, e.g., the constructor code or the constructor member
initialization section (see below) (Apparently, %define DEBUG 1
isn't required. Just setting yydebug = 1
appears to be all that is needed
to activate the parser's debugging code).
%define ERROR_VERBOSE
: if
defined, the parser's stack will be dumped when an error occurs.
%define LVAL yylval
: the default
variable name is shown here: the variable name containing the parser's
semantic value is by
default
yylval
, but its name may be redefined
here.
%define INHERIT : public ClassA,
public ClassB
: the
inheritance list for the parser's class. Note that it
starts with a
colon. The space character between INHERIT
and the colon may be omitted. The %define
itself should be omitted if the
parser class is not derived from another class.
%define MEMBERS member-prototypes
: if
the parser
contains extra members, they must be declared
here. Note that there is only one %define MEMBERS
definition allowed. So,
if multiple members are to be declared, they must all be declared at this
point. To prevent
very long lines in
the specification file, the \ (
backslash) can be used at the end of a line,
to indicate that it continues on the next line of the source-text. E.g.,
%define MEMBERS void lookup(); \ void lookdown();The
MEMBERS
section starts as a
public
section. If
private members are required too, a private:
directive can be
used in the MEMBERS
section.
%defines
can be used:
%define
CONSTRUCTOR_PARAM parameterlist
: this defines the parameterlist for the
parser's constructor. Here the types and names of the parameters of the
parser's constructor should be given. The surrounding
parentheses of the
parameter list are not part of the CONSTRUCTOR_PARAM
definition.
%define CONSTRUCTOR_INIT
: initializer(s)
: this defines the
base class and
member initializers of the constructor. Note the
initial
colon following CONSTRUCTOR_INIT
, which is required. The colon
may be given immediately after the CONSTRUCOR_INIT
statement, or blanks
may be used to separate the symbol from the colon.
%define CONSTRUCTOR_CODE
{ code }
: this defines the code of the parser's constructor (including
surrounding curly braces).
parse parser;
%union
. This starts the definition of the
semantical value union. It replaces the
#define YYSTYPE
definition seen
with bison
. An example of a %union
declaration is
%union { int i; double d; };The union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a
string
cannot be a member of the
union. A string *
, however, is a possible union member. As a side
line: the
lexical scanner does not have to know about this union. The
scanner can simply pass its scanned text to the parser through its
YYText()
member function. For example, a
statement like:
$$.i = atoi(scanner.YYText());
can be used to convert matched text to a value of an appropriate type.
$$
or the
generic return values $1, $2
, etc, that are associated with components of
rules can be used, rather than $$.i, $3.d
, etc. To associate a
non-terminal or a token with a union field, the
<fieldname>
specification is
used. E.g.,
%token <i> INT // token association (deprecated, see below) <d> DOUBLE %type <i> intExpr // non-terminal associationIn this example, note that both the tokens and the non-terminals can be associated with a field of the union. However, as noted earlier, the lexical scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser knows what the input is all about, and may convert strings like
"123"
to an integer
value. Consequently, we are discouraging the association of a union field and
a
token. In the upcoming
description of the rules of the grammar this will be illustrated further.
%union
discussion the
%token
and
%type
specifications should be noted. They are used for
the specficiation of the tokens (
terminal symbols) that can be returned by
the lexical scanner, and for the specification of the return types of
non-terminals. Apart from %token
the
token indicators
%left
,
%right
and
%nonassoc
may be used to specify the
associativity of operators. The tokens
mentioned at these indicators are interpreted as tokens indicating operators,
associating in the indicated direction. The
precedence of operators is given
by their order: the first specification has the lowest priority. To overrule a
certain precedence in a certain context,
%prec
can be
used. As all this is standard bison
practice, it isn't further discussed
in this context. The documentation provided with the bison
distribution
should be consulted for further reference.
19.7.2.3: The bison++ grammar rules
The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.
lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT { $$ = atoi(lexer.YYText()); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE { $$ = atof(lexer.YYText()); } ;Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols:
doubleExpr
is linked to real (double)
values, intExpr
is linked to integer values. Precedence and type
association is defined in the token section of the parser specification file,
which is:
%name Parser %union { int i; double d; }; %token INT DOUBLE %type <i> intExpr <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define MEMBERS \ virtual ~Parser() {} \ private: \ Scanner lexer; %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; }In the token section we see the use of the
%type
specifiers,
connecting intExpr
to the i
-field of the semantic-value union, and
connecting doubleExpr
to the d
-field. At first sight it looks
complex, since the expression rules must be included for each individual
return type. On the other hand, if the union itself would have been used, we
would have had to specify somewhere in the returned semantic values what field
to use: less rules, but more complex and error-prone code.
Also, note
that the lexical scanner is
included as a member of the parser. Although there's really no need here to
define the scanner as a separate Scanner
class, but doing so shows the
general approach we're advocating here. In the current situation, the
definition of the scanner class is extremely simple:
#include <FlexLexer.h> // provides yyFlexLexer interface class Scanner: public yyFlexLexer { };In general, the scanner can well be defined as a (private) member of the parser, as it's not used outside of the parser object. The parser's virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor. E.g.,
warning: `class Parser' has virtual functions but non-virtual destructor
19.7.2.4: The flex++ specification file
The flex-specification file that is used by our calculator is
simple: blanks are skipped, single characters are returned, and
numerical values are returned as either Parser::INT
or
Parser::DOUBLE
tokens. Here is the complete flex++
specification file:
%{ #include "parser.h" %} %option yylineno %% [ \t] ; [0-9]+ return(Parser::INT); "."[0-9]* | [0-9]+("."[0-9]*)? return(Parser::DOUBLE); exit | quit return (Parser::DONE); .|\n return (*yytext);
19.7.2.5: The generation of the code
The code
is generated in the
same way as with bison
and flex
. To order bison++
to generate the
files parser.cc
and parser.h
, give the command:
bison++ -d -o parser.cc parserNext,
flex++
will generate code on
lexer.cc
using the command
flex++ -I -olexer.cc lexerNote here that
flex++
expects no blanks between the -o
flag
and lexer.cc
.
On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:
g++ -o calc -Wall *.cc -lfl -sNote the fact that the program is linked against the
libfl.a
library.
If it's not provided, the
linker will complain about
unresolved references like
yywrap()
.
A source in which the main()
function and the parser object (having
the lexical scanner as one of its data members) is, finally:
#include "parser.h" int main() { Parser parser; cout << "Enter (nested) expressions containing ints, doubles, *, + and " "unary -\n" "operators. Enter an empty line, exit or quit to exit.\n"; return parser.yyparse(); }