Tuning a Program Using prof and tcov
The example problem and all the programs are taken from
Jon Bentley's book: More Programming Pearls,
published in 1988 by Addison-Wesley.
The problem is computing and printing all the prime numbers
from 2 up to a reasonably large number. The first program below
uses 10,000 as that limit and subsequent ones use 100,000
(because they are so much faster!).
The best algorithm for the problem is probably the
`Sieve of Eratosthenes'.
But as an example, Bentley takes the brute force approach of
testing a number N for primality by checking that it cannot
be divided evenly by any integer from 2 up to N-1.
Program P1
Version 1 of the program was programmed and run, and tcov
was used to generate statement-level execution counts.
- Click here to see the program and these
statement execution counts.
Program P2
Version 2 of the program is an improvement based on the
observation that we need test N for primality by checking the divisors
up to the square root of N.
- Click here to see program P2 and the
statement execution counts.
- We also ran program P2 with the prof profiler.
Click here to see the output
from prof.
Program P3
It should be observed that the number of calls to sqrt is
horrendous and the amount of time spent computing square roots is
dominating the whole program.
Examining the program, we can see that the root function is being
called every time around the for loop inside the
prime function.
If n does not change, the value of root(n) does
not change either. This leads to version P3 of the program,
where the call to root has been moved out of the loop.
- Click here to see program P3 and its
statement execution counts.
- And click here to see the output
from prof.
Program P4
The number of calls to sqrt can be reduced even further
by unrolling the for loop a few times.
The resulting program is about twice as fast.
- Click here to see program P4 and its
statement execution counts.
- And click here to see the output from
prof.
Further Improvements
Bentley observed that the condition i<sqrt(n) can be
recast as i*i<sqrt(n).
This change eliminates all calls to the sqrt
function.
However, the profile shows that the sqrt function is
hardly important in the P4 version of the program.
The biggest gains would come from eliminating uses of the
C remainder operator %.
We can easily replace the test
if ((n % 2) == 0)
with an equivalent test based on the binary representation for
integers:
if ((n & 1) == 0)
The best improvement would come, however, by switching to a better
algorithm -- the Sieve of Eratosthenes.