In this QuickStart guide, we will focus on finding the bottleneck (the part of the code that takes the longest to run) in a program. As an example, we will use the kruse program from Programming Assignment 1.
unix% g++ -g -pg -o kruse kruse.ccThis creates an executable called kruse from the source file kruse.cc with debugging and profiling turned on.
Another way to do this is to add -pg to the CFLAGS line in the file called Makefile.
Things to keep in mind:
unix% gprof program-name [ data-file ] [ > output-file ]If you don't specify the name of a data file, gmon.out is assumed. Following the gprof command with "> output-file" causes the output of gprof to be saved to output-file so you can examine it later.
For this example, the program name is kruse and we will save the output into a file called kruse.output:
unix% gprof kruse > kruse.output
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 37.50 0.15 0.15 48000 3.12 3.12 Life::neighbor_count(int, int) 17.50 0.22 0.07 _IO_do_write 10.00 0.26 0.04 __overflow 7.50 0.29 0.03 _IO_file_overflow 7.50 0.32 0.03 _IO_putc 5.00 0.34 0.02 12 1666.67 14166.67 Life::update(void) 5.00 0.36 0.02 stdiobuf::overflow(int) 5.00 0.38 0.02 stdiobuf::sys_write(char const *, int) 2.50 0.39 0.01 ostream::operator<<(char) 2.50 0.40 0.01 internal_mcount 0.00 0.40 0.00 12 0.00 0.00 Life::print(void) 0.00 0.40 0.00 12 0.00 0.00 to_continue(void) 0.00 0.40 0.00 1 0.00 0.00 Life::initialize(void) 0.00 0.40 0.00 1 0.00 0.00 instructions(void) 0.00 0.40 0.00 1 0.00 170000.00 mainNote that the functions mcount and profil (profil does not appear in this listing) are part of the profiling aparatus; their time gives a measure of the amount of overhead due to profiling. Also note that functions like stdiobuf::sys_write and _IO_do_write are part of the system libraries and not directly part of your code.
In this output, we can see that 37.5% of kruse's execution time is spent in Life::neighbor_count. This is the highest percentage for any function in the program. It is also worthwhile to note that it gets called 48,000 times. This is our first hint that Life::neighbor_count might be the biggest bottleneck in kruse.
Here is some of the output from the call graph:
Call graph (explanation follows) granularity: each sample hit covers 4 byte(s) for 2.50% of 0.40 seconds index % time self children called name 0.02 0.15 12/12 main [2] [1] 42.5 0.02 0.15 12 Life::update(void) [1] 0.15 0.00 48000/48000 Life::neighbor_count(int, int) [4] ----------------------------------------------- 0.00 0.17 1/1 _start [3] [2] 42.5 0.00 0.17 1 main [2] 0.02 0.15 12/12 Life::update(void) [1] 0.00 0.00 12/12 Life::print(void) [13] 0.00 0.00 12/12 to_continue(void) [14] 0.00 0.00 1/1 instructions(void) [16] 0.00 0.00 1/1 Life::initialize(void) [15] -----------------------------------------------The lines full of dashes divide this table into entries, one for each function. Each entry has one or more lines.[3] 42.5 0.00 0.17 _start [3] 0.00 0.17 1/1 main [2] ----------------------------------------------- 0.15 0.00 48000/48000 Life::update(void) [1] [4] 37.5 0.15 0.00 48000 Life::neighbor_count(int, int) [4] -----------------------------------------------
In each entry, the primary line is the one that starts with an index number in square brackets. The end of this line says which function the entry is for.
The preceding lines in the entry describe the callers of this function and the following lines describe its subroutines (also called children when we speak of the call graph). If the caller of a function cannot be determined, <spontaneous> is printed instead.
The entries are sorted by time spent in the function and its subroutines.
In this example, we see that the first entry is for Life::update, the second entry is for main, and so on. 42.5% of the program's execution time is spent in Life::update and its children. Life::update only has one child, Life::neighbor_count. In the fourth entry, we see that Life::neighbor_count consumes 37.5% of the program's execution time and has no children. As in the flat profile, the call graph shows that Life::neighbor_count was called 48,000 times.
Based on this information and what we observed in the flat profile, we can conclude that Life::neighbor_count is the main bottleneck in kruse.