Tuning a Program Using prof and tcov

The example problem and all the programs are taken from Jon Bentley's book: More Programming Pearls, published in 1988 by Addison-Wesley.

The problem is computing and printing all the prime numbers from 2 up to a reasonably large number. The first program below uses 10,000 as that limit and subsequent ones use 100,000 (because they are so much faster!).

The best algorithm for the problem is probably the `Sieve of Eratosthenes'. But as an example, Bentley takes the brute force approach of testing a number N for primality by checking that it cannot be divided evenly by any integer from 2 up to N-1.

Program P1

Version 1 of the program was programmed and run, and tcov was used to generate statement-level execution counts.

Program P2

Version 2 of the program is an improvement based on the observation that we need test N for primality by checking the divisors up to the square root of N.

Program P3

It should be observed that the number of calls to sqrt is horrendous and the amount of time spent computing square roots is dominating the whole program. Examining the program, we can see that the root function is being called every time around the for loop inside the prime function. If n does not change, the value of root(n) does not change either. This leads to version P3 of the program, where the call to root has been moved out of the loop.

Program P4

The number of calls to sqrt can be reduced even further by unrolling the for loop a few times. The resulting program is about twice as fast.

Further Improvements

Bentley observed that the condition i<sqrt(n) can be recast as i*i<sqrt(n). This change eliminates all calls to the sqrt function.

However, the profile shows that the sqrt function is hardly important in the P4 version of the program. The biggest gains would come from eliminating uses of the C remainder operator %. We can easily replace the test

	if ((n % 2) == 0) 
with an equivalent test based on the binary representation for integers:
	if ((n & 1) == 0) 

The best improvement would come, however, by switching to a better algorithm -- the Sieve of Eratosthenes.