The Semantics of Constructors

Default Constructor Construction:

There are four characteristics of a class under which the compiler needs to synthesize a default constructor for classes that declare no constructor at all. The Standard refers to these as implicit, nontrivial default constructors. The synthesized constructor fulfills only an implementation need. It does this by

  1. Invoking member object default constructor or,
  2. Base class default constructors or,
  3. Initializing the virtual function or,
  4. Virtual base class mechanism for each object.

Classes that do not exhibit these characteristics and that declare no constructor at all are said to have implicit, trivial default constructors. In practice, these trivial default constructors are not synthesized.

Within the synthesized default constructor, only the base class subobjects and member class objects are initialized. All other nonstatic data members, such as integers, pointers to integers, arrays of integers, and so on, are not initialized. These initializations are needs of the program, not of the implementation. If there is a program need for a default constructor, such as initializing a pointer to 0, it is the programmer’s responsibility to provide it in the course of the class implementation.

Programmers new to C++ often have two common misunderstandings:

  1. That a default constructor is synthesized for every class that does not define one
  2. That the compiler-synthesized default constructor provides explicit default initializers for each data member declared within the class

As you have seen, neither of these is true

In the case of having virtual function in your class, the following two class "augmentations" occur during compilation:

  1. A virtual function table (referred to as the class vtbl in the original cfront implementation) is generated and populated with the addresses of the active virtual functions for that class.
  2. Within each class object, an additional pointer member (the vptr) is synthesized to hold the address of the associated class vtbl.

Copy Constructor Construction:

When are bitwise copy semantics not exhibited by a class? There are four instances:

  1. When the class contains a member object of a class for which a copy constructor exists (either explicitly declared by the class designer, or synthesized by the compiler)
  2. When the class is derived from a base class for which a copy constructor exists (again, either explicitly declared or synthesized)
  3. When the class declares one or more virtual functions
  4. When the class is derived from an inheritance chain in which one or more base classes are virtual

Program Transformation Semantics:

Such a requirement would levy a possibly severe performance penalty on a great many programs. For example, although the following three initializations are semantically equivalent:

X xx0( 1024 );
X xx1 = X( 1024 );
X xx2 = ( X ) 1024;

In the second and third instances, the syntax explicitly provides for a two-step initialization:

  1. Initialize a temporary object with 1024.
  2. Copy construct the explicit object with the temporary object.

That is, whereas xx0 is initialized by a single constructor invocation

// Pseudo C++ Code
xx0.X::X( 1024 );

a strict implementation of either xx1 or xx2 results in two constructor invocations, a temporary object, and a call to the destructor of class X on that temporary object:

// Pseudo C++ Code
X __temp0;
__temp0.X::X( 1024 );
xx1.X::X( __temp0 );
__temp0.X::~X();

The simplest method of implementing the copy constructor is as follows:

Point3d::Point3d( const Point3d &rhs )

{

   _x = rhs._x;

   _y = rhs._y;

   _z = rhs._z;

};

This is okay, but use of the C library memcpy() function would be more efficient:

Point3d::Point3d( const Point3d &rhs )

{

   memcpy( this, &rhs, sizeof( Point3d );

};

Use of both memcpy() and memset(), however, works only if the classes do not contain any compiler-generated internal members. If the Point3d class declares one or more virtual functions or contains a virtual base class, use of either of these functions will result in overwriting the values the compiler set for these members.

As you can see, correct use of the memset() and memcpy() functions requires some knowledge of the C++ Object Model semantics!

Member Initialization List:

You must use the member initialization list in the following cases in order for your program to compile:

  1. When initializing a reference member.
  2. When initializing a const member.
  3. When invoking a base or member class constructor with a set of arguments.

The order in which the list entries are set down is determined by the declaration order of the members within the class declaration, not the order within the initialization list.

In summary, the compiler iterates over and possibly reorders the initialization list to reflect the declaration order of the members. It inserts the code within the body of the constructor prior to any explicit user code.

Object Lessons

There are three general flavors of transformations required by any object model component:

  1. Implementation-dependent transformations. These are implementation-specific aspects and vary across compilers.
  2. Language semantics transformations. These include constructor/destructor synthesis and augmentation, memberwise initialization and memberwise copy support, and the insertion within program code of conversion operators, temporaries, and constructor/destructor calls.
  3. Code and object model transformations. These include support for virtual functions, virtual base classes and inheritance in general, operators new and delete, arrays of class objects, local static class instances, and the static initialization of global objects with nonconstant expressions.

There are two aspects to the C++ Object Model:

  1. Direct support for object-oriented programming provided within the language.
  2. The underlying mechanisms by which this support is implemented.

Determining when to provide a copy constructor, and when not, is not something one should guess at or have adjudicated by some language guru. It should come from an understanding of the Object Model.

Differences between C Macro and Inline Funcion:

  • Macro invocations do not perform type checking, or even check that arguments are well-formed, whereas function calls usually do.
  • You cannot make the macro return something which is not the result of the last expression invoked inside it
  • Since C macros use mere textual substitution, this may result in unintended side-effects and inefficiency due to re-evaluation of arguments and order of operations.
  • Compiler errors within macros are often difficult to understand, because they refer to the expanded code, rather than the code the programmer typed.
  • Debugging information for inlined code is usually more helpful than that of macro-expanded code.

As you will see, the primary layout and access-time overheads within C++ are associated with the virtuals, that is,

  • The virtual function mechanism in its support of an efficient run-time binding.
  • Virtual base class in its support of a single, shared instance of a base class occurring multiple times within an inheritance hierarchy.

The C++ Object Model

Say we’ve the following class

class Point

{

public:

    Point( float xval );

    virtual ~Point();

    float x() const;

    static
int PointCount();

protected:

    virtual ostream& print( ostream &os ) const;

    float _x;

    static int _point_count;
};

A Simple Object Model:

In a simple model, the members themselves are not placed within the object. Only pointers addressing the members are placed within the object. Doing this avoids problems from members’ being quite different types and requiring different amounts (and sometimes different types of) storage. Members within an object are addressed by their slot’s index.

Although this model is not used in practice, this simple concept of an index or slot number is the one that has been developed into the C++ pointer-to-member concept.

A Table-driven Object Model:

For an implementation to maintain a uniform representation for the objects of all classes, an alternative object model might factor out all member specific information, placing it in a data member and member function pair of tables. The class object contains the pointers to the two member tables. The member function table is a sequence of slots, with each slot addressing a member. The data member table directly holds the data.

Although this model is not used in practice within C++, the concept of a member function table has been the traditional implementation supporting efficient runtime resolution of virtual functions

The C++ Object Model:

Stroustrup’s original (and still prevailing) C++ Object Model is derived from the simple object model by optimizing for space and access time. Nonstatic data members are allocated directly within each class object. Static data members are stored outside the individual class object. Static and nonstatic function members are also hoisted outside the class object. Virtual functions are supported in two steps:

  1. A table of pointers to virtual functions is generated for each class (this is called the virtual table).
  2. A single pointer to the associated virtual table is inserted within each class object (traditionally, this has been called the vptr). The setting, resetting, and not setting of the vptr is handled automatically through code generated within each class constructor, destructor, and copy assignment operator. The type_info object associated with each class in support of runtime type identification (RTTI) is also addressed within the virtual table, usually within the table’s first slot.

Figure below illustrates the general C++ Object Model for our Point class. The primary strength of the C++ Object Model is its space and runtime efficiency. Its primary drawback is the need to recompile unmodified code that makes use of an object of a class for which there has been an addition, removal, or modification of the nonstatic class data members. (The two table model, for example, offers more flexibility by providing an additional level of indirection. But it does this at the cost of space and runtime efficiency.)

Adding Inheritance

In the case of virtual inheritance, only a single occurrence of the base class is maintained (called a subobject) regardless of how many times the class is derived from within the inheritance chain. iostream, for example, contains only a single instance of the virtual ios base class.

How might a derived class internally model its base class instance? In a simple base class object model, each base class might be assigned a slot within the derived class object. Each slot holds the address of the base class subobject. The primary drawback to this scheme is the space and access-time overhead of the indirection. A benefit is that the size of the class object is unaffected by changes in the size of its associated base classes.

Alternatively, one can imagine a base table model. Here, a base class table is generated for which each slot contains the address of an associated base class, much as the virtual table holds the address of each virtual function. Each class object contains a bptr initialized to address its base class table. The primary drawback to this strategy, of course, is both the space and access-time overhead of the indirection. One benefit is a uniform representation of inheritance within each class object. Each class object would contain a base table pointer at some fixed location regardless of the size or number of its base classes. A second benefit would be the ability to grow, shrink, or otherwise modify the base class table without changing the size of the class objects themselves.

Understanding C++ object model let programmer knows what the final code after C++ code transformations.

A C program’s trick is sometimes a C++ program’s trap. One example of this is the use of a one-element array at the end of a struct to allow individual struct objects to address variable-sized arrays:

struct mumble {

    /* stuff */

    char pc[ 1 ];

};

// grab a string from file or standard input

// allocate memory both for struct & string

struct mumble *pmumb1 = ( struct mumble* ) malloc(sizeof(struct mumble)+strlen(string)+1);

strcpy( &mumble.pc, string );

This may or may not translate well when placed within a class declaration that

  • Specifies multiple access sections containing data,
  • Derives from another class or is itself the object of derivation, or
  • Defines one or more virtual functions.

If a programmer absolutely needs a data portion of an arbitrarily complex C++ class to have the look and feel of an equivalent C declaration, that portion is best factored out into an independent struct declaration. The original idiom for combining this C portion with its C++ part was to derive the C++ part from the C struct:

struct C_point { … };

class Point : public C_point { … };

Thus supporting both the C and C++ usage:

extern void draw_line( Point, Point );

extern "C" void draw_rect ( C_point, C_Point );

draw_line( Point( 0, 0 ), Point( 100, 100 ));

draw_rect( Point( 0, 0 ), Point( 100, 100 ));

This idiom is no longer recommended, however, because of changes to the class inheritance layout in some compilers (for example, the Microsoft C++ compiler) in support of the virtual function mechanism. Composition, rather than inheritance, is the only portable method of combining C and C++ portions of a class (the conversion operator provides a handy extraction method):

struct C_point { … };

class Point {

public:

    operator C_point() { return _c_point; }

    // …

private:

    C_point _c_point;

    // …

};

One reasonable use of the C struct in C++, then, is when you want to pass all or part of a complex class object to a C function. This struct declaration serves to encapsulate that data and guarantees a compatible C storage layout. This guarantee, however, is maintained only under composition. Under inheritance, the compiler decides whether additional data members are inserted within the base struct subobject.

The memory requirements to represent a class object in general are the following:

  • The accumulated size of its nonstatic data members.
  • Plus any padding (between members or on the aggregate boundary itself) due to alignment constraints (or simple efficiency)
  • Plus any internally generated overhead to support the virtuals.

The memory requirement to represent a pointer, however, is a fixed size regardless of the type it addresses.

But how, then, does a pointer to a ZooAnimal differ from, say, a pointer to an integer or a pointer to a template Array instantiated with a String?

ZooAnimal *px;

int *pi

Array< String > *pta;

In terms of memory requirements, there is generally no difference: all three need to be allocated sufficient memory to hold a machine address (usually a machine word). So the difference between pointers to different types rests neither in the representation of the pointer nor in the values (addresses) the pointers may hold. The difference lies in the type of object being addressed. That is, the type of a pointer instructs the compiler as to how to interpret the memory found at a particular address and also just how much memory that interpretation should span.

When a base class object is directly initialized or assigned with a derived class object, the derived object is sliced to fit into the available memory resources of the base type. There is nothing of the derived type remaining. Polymorphism is not present, and an observant compiler can resolve an invocation of a virtual function through the object at compile time, thus by-passing the virtual mechanism. This can be a significant performance win if the virtual function is defined as inline.

C++ also supports a concrete ADT style of programming now called object-based (OB)—nonpolymorphic data types, such as a String class. A String class exhibits a nonpolymorphic form of encapsulation; it provides a public interface and private implementation (both of state and algorithm) but does not support type extension. An OB design can be faster and more compact than an equivalent OO design. Faster because all function invocations are resolved at compile time and object construction need not set up the virtual mechanism, and more compact because each class object need not carry the additional overhead traditionally associated with the support of the virtual mechanism. However, an OB design also is less flexible

Network Programming with Windows Sockets

Named pipes and mailslots are suitable for interprocess communication between processes on the same computer or processes on Windows computers connected by a local or wide area network.

Named pipes and mailslots (both simply referred to here as “named pipes” unless the distinction is important) have the distinct drawback, however, of not being an industry standard. This is the case even though named pipes are protocolindependent and can run over industry-standard protocols such as TCP/IP.

Windows provides interoperability by supporting Windows Sockets, which are nearly the same as, and interoperable with, Berkeley Sockets, a de facto industry standard.

Winsock, because of conformance to industry standards, has naming conventions and programming characteristics somewhat different from the Windows functions described so far. The Winsock API is not strictly a part of the Windows API. Winsock also provides additional functions that are not part of the standard; these functions are used only as absolutely required. Among other advantages, programs will be more portable to other operating systems.

The Winsock API was developed as an extension of the Berkeley Sockets API into the Windows environment, and all Windows versions support Winsock. Winsock’s benefits include the following:

  • Porting code already written for Berkeley Sockets is straightforward.
  • Windows machines easily integrate into TCP/IP networks, both IPv4 and IPv6. IPv6, among other features, allows for longer IP addresses, overcoming the 4-byte address limit of IPv4.
  • Sockets can be used with Windows overlapped I/O, which, among other things, allows servers to scale when there is a large number of active clients.
  • Windows provides non-portable extensions.
  • Sockets supports protocols other than TCP/IP particularly Asynchronous Transfer Mode (ATM)

Differences between Winsock and Named Pipes (General):

  • Named pipes can be message-oriented, which can simplify programs.
  • Named pipes require ReadFile and WriteFile, whereas sockets can also use send and recv.
  • Sockets, unlike named pipes, are flexible so that a user can select the protocol to use with a socket, such as TCP or UDP. The user can also select protocols based on quality of service and other factors.
  • Sockets are based on an industry standard, allowing interoperability with non-Windows machines.
  • Named pipes do not have explicit port numbers and are distinguished by name.

Differences between Winsock and Named Pipes (Server Side):

  • When using sockets, call accept repetitively to connect to multiple clients. Each call will return a different connected socket.
  • Named requires you to create each named pipe instance with CreateNamedPipe. accept creates socket instances.
  • There is no upper bound on the number of socket clients, but there can be a limit on the number of named pipe instances, depending on the first call to CreateNampedPipe.
  • There are no WinSock convenience functions comparable to TransactNamedPipe.
  • Named pipes do not have explicit port numbers and are distinguished by name.

A named pipe server requires two function calls (CreateNamedPipe and ConnectNamedPipe), whereas socket servers require four function calls (socket, bind, listen and accept)

Differences between Winsock and Named Pipes (Client Side):

  • Named pipes use WaitNamedPipe following by CreateFile. The socket sequence is in the opposite order because the function socket can be regarded as the creation function, while connect is the blocking function.
  • An additional distinction is connect that is a socket client function, while the similarly named ConnectNamedPipe is a server function.

A thread-persistence problem may occur when you’ve multiple clients and DLL which includes the WinSock processing code (accept and recv). There are two proposed solutions:

  • Using Thread Local Storage (TLS). This solution dictates that each thread must handle a specific accept request at a time.
  • Encapsulate each accept request into a structure and then pass this structure to the thread.

Datagrams

Datagrams are similar to mailslots and are used in similar circumstances. There is no connection between the sender and receiver, and there can be multiple receivers. Delivery to the receiver is not ensured with either mailslots or datagrams, and successive messages will not necessarily be received in the order they were sent.