One way to attack this is to use Time Travel Debugging, where the entire program state is recorded over time, and so a developer can re-play the exact sequence of events. Time Travel Debugging is slow to set up, though, and the overhead it introduces alters timing and potentially altering the repro. (These issues are commonly gnarly thread or cross-process concurrency bugs).
I took the idea of time travel debugging and stripped it to its most basic element. I wrote a tool for C++ that I call "trace travel debugging" that essentially adds lightweight logging to each function call. You can then contrast logs during a failure with logs during success.
I used my tool to instrument the Anti-Grain graphics library, ran my code, and got this trace:
The level of indentation reflects the callstack. Seeing the flow of execution like this can also be useful for learning how a large program works.
Details:
Each log statement is very lightweight - a single 32-bit integer written to a file handle. Because race conditions / thread concurrency issues could be the cause, I needed this tool to have minimal impact on the program. Each thread has its own filehandle/outputfile to avoid as much blocking as possible. Buffered output dramatically reduces the cost of these many fwrite calls. The resulting overhead was sufficiently small in the test projects I used.The Python script uses a heuristic to find all functions and methods. The script associates a 32 bit integer with the current function and writes this to a persisted lookup file to use later. It injects code that creates a class instance. The class's constructor and destructor both write traces, and so we can trace both entry and exit.
tracetraveldebugging_print.py simply reads the binary output, finding each tag in the lookup file that was persisted earlier.
In Windows, see also the Visual Studio feature of tracepoint breakpoints that write messages to the Debug window. ETW is a very-low-overhead tracing mechanism that could have been used here, but it is not cross-platform, and requires an ETW consumer module to be written to output events to disk.
Usage:
1) Add the following line to the headers section of a .cpp or .h file to log: #define TTD_TRACELOG(s,n)2) If desired, add some manual logging statements in these files. For example, TTD_TRACELOG("the height is", height); or TTD_TRACELOG("in function main, time is {TIME}", 0);
3) Add each file to a list in tracetraveldebugging.py:
Now, run tracetraveldebugging.py. It will add logging statements for each C++ function/method found by my script. Rebuild and run your program. Run tracetraveldebugging_print.py, and you will see traces for each thread.