Next post: Python printval

Trace travel debugging

When working in a large codebase, from time to time I've encountered bugs that are difficult to reproduce. By the time the failure has occurred, the bad state is gone. Unless you have good repro steps, stepping through with a debugger won't help.

One way to attack this is to use Time Travel Debugging, where the entire program state is recorded over time, and so a developer can re-play the exact sequence of events. Time Travel Debugging is slow to set up, though, and the overhead it introduces alters timing and potentially altering the repro. (These issues are commonly gnarly thread or cross-process concurrency bugs).

I took the idea of time travel debugging and stripped it to its most basic element. I wrote a tool for C++ that I call "trace travel debugging" that essentially adds lightweight logging to each function call. You can then contrast logs during a failure with logs during success.

I used my tool to instrument the Anti-Grain graphics library, ran my code, and got this trace:
Agg2D::Agg2D(
  void Agg2D::lineCap(LineCap cap)
  void Agg2D::lineJoin(LineJoin join)
void Agg2D::attach(Image& img)
  void Agg2D::attach(unsigned char* buf, unsigned w, unsigned h, int stride)
    void Agg2D::resetTransformations()
    void Agg2D::lineWidth(real w)
    void Agg2D::lineColor(unsigned r, unsigned g, unsigned b, unsigned a)
      void Agg2D::lineColor(Color c)
    void Agg2D::fillColor(unsigned r, unsigned g, unsigned b, unsigned a)
      void Agg2D::fillColor(Color c)
    void Agg2D::textAlignment(TextAlignment alignX, TextAlignment alignY)
    void Agg2D::clipBox(real x1, real y1, real x2, real y2)
    void Agg2D::lineCap(LineCap cap)
    void Agg2D::lineJoin(LineJoin join)
    void Agg2D::flipText(bool flip)
    void Agg2D::imageFilter(ImageFilter f)
    void Agg2D::imageResample(ImageResample f)
void Agg2D::lineWidth(real w)
void Agg2D::lineColor(unsigned r, unsigned g, unsigned b, unsigned a)
  void Agg2D::lineColor(Color c)
void Agg2D::line(real x1, real y1, real x2, real y2)
  void Agg2D::addLine(real x1, real y1, real x2, real y2)
  void Agg2D::drawPath(DrawPathFlag flag)
    void Agg2D::addStrokePath()
    void Agg2D::render(bool fillColor)
    time at Agg2D::render is (2013-11-3-22-7-54-170790)
      static void render(Agg2D& gr,
        void blend_solid_hspan(int x, int y,
          void blend_pix(value_type* p,
          void blend_pix(value_type* p,
          void blend_pix(value_type* p,
          void blend_pix(value_type* p,
        void blend_solid_hspan(int x, int y,
          void blend_pix(value_type* p,
          void blend_pix(value_type* p,
          void blend_pix(value_type* p,

The level of indentation reflects the callstack. Seeing the flow of execution like this can also be useful for learning how a large program works.

Details:

Each log statement is very lightweight - a single 32-bit integer written to a file handle. Because race conditions / thread concurrency issues could be the cause, I needed this tool to have minimal impact on the program. Each thread has its own filehandle/outputfile to avoid as much blocking as possible. Buffered output dramatically reduces the cost of these many fwrite calls. The resulting overhead was sufficiently small in the test projects I used.

The Python script uses a heuristic to find all functions and methods. The script associates a 32 bit integer with the current function and writes this to a persisted lookup file to use later. It injects code that creates a class instance. The class's constructor and destructor both write traces, and so we can trace both entry and exit.

tracetraveldebugging_print.py simply reads the binary output, finding each tag in the lookup file that was persisted earlier.

In Windows, see also the Visual Studio feature of tracepoint breakpoints that write messages to the Debug window. ETW is a very-low-overhead tracing mechanism that could have been used here, but it is not cross-platform, and requires an ETW consumer module to be written to output events to disk.

Usage:

1) Add the following line to the headers section of a .cpp or .h file to log: #define TTD_TRACELOG(s,n)

2) If desired, add some manual logging statements in these files. For example, TTD_TRACELOG("the height is", height); or TTD_TRACELOG("in function main, time is {TIME}", 0);

3) Add each file to a list in tracetraveldebugging.py:
srcs = [ '~/fastdev/fastpixel/fastpixel_v3/sdl/agg24/src', 
 '~/fastdev/fastpixel/fastpixel_v3/sdl/agg24/include/agg_pattern_filters_rgba.h', 
 '~/fastdev/fastpixel/fastpixel_v3/sdl/agg24/include/agg_pixfmt_rgb.h' ]
outputdir = '~/output'


Now, run tracetraveldebugging.py. It will add logging statements for each C++ function/method found by my script. Rebuild and run your program. Run tracetraveldebugging_print.py, and you will see traces for each thread.