STL

The Standard Template Library (STL) is a powerful combination of containers and generic algorithms. From a performance perspective, a few questions immediately come to mind:

  • The STL comes bundled with performance guarantees of the asymptotic complexity of the various containers and algorithms. What does it really mean?
  • The STL consists of many containers. Faced with a given computational task, what containers should I use? Are some better than others for a given scenario?
  • How good is the performance of the STL? Can I do better by rolling my own home-grown containers and algorithms?\

Insertion Performance:

Steps taken in vector expansion:

  • Allocate a larger memory block to make room for additional elements.
  • Copy the existing collection elements to the newly allocated memory. The copy constructor is invoked for each element in the old collection.
  • Destroy the old collection and free its memory. A destructor would be invoked for each element of the old collection copy.

To see how the copy constructor cost the performance see the comparison below of BigInt class:

BigInt::BigInt(const
BigInt& copyFrom) // Copy constructor

{

    size = ndigits = copyFrom.ndigits;

    digits = new
char[size];

    for ( unsigned
i = 0; i < ndigits; ++i) {

        digits[i] = copyFrom.digits[i];

    }

}

The dramatic difference in performance is a reflection of the difference between the cost of copying and destroying plain integers and that of BigInt objects. If you find yourself in a situation where the object’s copy constructor and destructor are fairly expensive, and vector capacity growth is very likely, you could still circumvent the cost by storing pointers instead of objects. Pointers to objects do not have associated constructors and destructor. The cost of copying a pointer is essentially identical to an integer copy. The vector insertion time of a million BigInt pointers was practically the same as the integer case and is given in this figure:

The list container does not hold its elements in contiguous memory and consequently does not have to contend with capacity issues and their associated performance overhead. As a result, the list container does better than a vector on the BigInt insertion test. See figure below:

The previous figure brings up another important point: Each container has its strengths and weaknesses. Although the vector is superior to the list on insertion of a million integers, the list outperforms the vector on a similar test involving BigInt objects.

Using reserve() will improve vector insertion dramatically:

The inserting at front cost:

Comparing list to vector deletion at the back:

Deleting elements at the front:

Container traversal speed:

With regard to container traversal, the performance of the vector and array was the same. Both outperformed the list container by a wide margin. The dominant factor in container traversal seemed to be the interaction between the container memory layout and the system’s cache. Both vector and array hold their collections in contiguous memory. Logically adjacent elements of the collection reside physically next to one another in memory. When a particular element is loaded into the cache, it pulls with it a few more adjacent elements that will be accessed next (the exact number depends on the element size as well as the size of a cache line). This is ideal behavior from a caching perspective. This is not the case with a list container. Logically adjacent list elements are not necessarily adjacent in memory. Moreover, the list elements must store forward and backward pointers in addition to element value, which makes them larger than the corresponding vector elements. Even if some adjacent list elements were placed next to one another in memory, fewer of them would fit on a cache line due to their size. Consequently, traversal of a list container produces far more cache misses than an array or vector traversal.

Using STL find() function to search for a value inside containers:

Using STL find() Vs. multisel member function find():

The member find() utilizes the fact that the multiset is a sorted container. The generic find() is oblivious to that fact because it must handle containers that are not sorted.

Comparing function object to function pointer:

is in line with expectation. Function pointers cannot be resolved until run-time, which prevents them from being inlined. Function objects, however, are determined at compile-time, which gives the compiler the freedom to inline the calls to operator() and significantly increase efficiency.

Key Points:

  • Unless you know something about the problem domain that the STL doesn’t, it is unlikely that you will beat the performance of an STL implementation by a wide enough margin to justify the effort.

It is possible, however, to exceed the performance of an STL implementation in some specific scenarios.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s