In a previous article on optimization we looked at how to read a flame graph and discover areas of a program that could benefit from optimization by re-writing the source code.

This is a powerful but manual process. What if you could profile your program and use the profiling data to automatically optimize the program? “Profile guided optimization” does something like this. It won’t rewrite your code, but it will allow the compiler to pinpoint where to apply optimizations based on actual performance of the program as it runs.

Profile Guided Optimization for GCC

You can instrument your programs to produce profiling information, then use that information to rebuild the executable with better optimizations.

To do this with “gcc” you use two flags: -fprofile-generate, and in a subsequent invocation, -fprofile-use

  1. Build the program instrumented with the profiling support:
    $ gcc -fprofile-generate=path-for-profiling-files source.c  -o program
  1. Run the program. It will produce profile information in files that the final step consumes:
    $ ./program
  1. Use the profiling information files produced in step 2:

First, ensure you have cleared any cached object files if you use “ccache”:

    ccache -C

Then, make a clean build

    $ gcc -fprofile-use=path-of-profiling-files source.c -o faster_program

You need to use these flags both in the linking step and in steps that only create object files.

Results

Results will vary. I experimented with some existing C++ 14 applications I’ve written. One, a flat-file to Parquet format converter, improved by only about five percent over an executable built with blanket -O3 optimization levels. Another, the “DCP” I’ve discussed before, improved by around thirty percent faster compared with the same program built with -O3.

These tests were done with GCC 5.4, not exactly the newest. I’ll attempt to do similar tests with GCC 7 and 9.