Coding Optimizations

Avoid redundant computations:

for (i = 0; i < 100; i++) {

    a[i] = m*n;

}

Should be

int
k = m*n;

for (i = 0; i < 100; i++) {

    a[i] = k;

}

Caching

Caching is often associated with data and instruction caching on modern processors. But, caching opportunities appear in much wider contexts, including application-level coding and design. Caching is about remembering the results of frequent and costly computations, so that you will not have to perform those computations over and over again.

This piece of code doesn’t cach

for(… ;!done;… ) {

    done = patternMatch(pat1, pat2, isCaseSensitive());

}

This is caching

int
isSensitive = isCaseSensitive();

for(… ;!done;… ) {

    done = patternMatch(pat1, pat2, isSensitive);

}

Take the class X for example:

class
X {

public:

    X() : a(1), c(2) {}

    …

private:

    int
a;

    char
b[4096]; // buffer

    int
c;

};

The X::X() constructor initializes members a and c. A standard-conforming compiler will lay out object X in the declaration order: members a and c will be separated by 4096 bytes of member b, and will not reside on the same cache line. The X::X() constructor is likely to suffer a cache miss when it accesses c.

Since a and c are accessed by neighboring instructions, we could be more cache-friendly by placing them next to one another:

class
X {

    …

private:

    int
a;

    int
c;

    char
b[4096]; // buffer

};

a and c are more likely now to reside on the same cache line. Since a is accessed prior to c, c is almost guaranteed to reside in the data cache when we need it

Precompute

Precomputing is a close relative of caching. When you cache a result, you pay the price of computing it once in the performance-critical path. When you precompute, you don’t even pay the first time. You perform the precomputation outside the performance-critical path (say, initialization-time) and never pay the price of performing a costly computation inside the performance-critical path.

In a web server application, the server can compare inputs with specified form. So, “Accept” is different from “AcCepT”. A computation is made as follows to resolve this problem:

for (char *p = header; *p ; p++) { // Uppercase the header

    *p = toupper(*p);

}

if (0 == memcmp(header, “ACCEPT:”, 7)) {

    …

}

This is costly! Below is more smart solution:

void
initLookupTable()

{

    for (int
i = 0; i < 256; i++) {

        uppercaseTable[i] = toupper(i);

    }

}

for (char *p = header; *p ; p++) {

    *p = uppercaseTable[*p];

}

if (0 == memcmp(header, “ACCEPT:”, 7)) {

    …

}

Just call initLookupTable at the start of processing and then use it directly through processing.

The 80-20 rule has many applications: 80% of the execution scenarios will traverse only 20% of your source code, and 80% of the elapsed time will be spent in 20% of the functions encountered on the execution path. The 80-20 rule is the dominating force driving the argument that premature tuning is a sin. If you randomly tune everything you can think of, not only do you waste 80% of the effort, you will also hack the design beyond repair.

An efficient use of programmer resources is to tune those 20% of the request types that appear 80% of the time.

Remember to use lazy evaluation whenever you can.

Try to use reference local variables over pointer local variables.

Some of the common compiler optimizations are load variables in registers and inlining. Also remember that compiling the program in Release mode is “mostly” more efficient than Debug mode.

The speed differential among cache, RAM, and disk access is significant. Write cache-friendly code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s