Tuesday, August 25, 2009

Debugging GCC

Examining core files

In addition to allowing programs to be run under the debugger, an important benefit of the -g option is the ability to examine the cause of a program crash from a "core dump".

When a program exits abnormally (i.e. crashes) the operating system can write out a core file (usually named ‘core’) which contains the in-memory state of the program at the time it crashed. This file is often referred to as a core dump.(14) Combined with information from the symbol table produced by -g, the core dump can be used to find the line where the program stopped, and the values of its variables at that point.

This is useful both during the development of software and after deployment--it allows problems to be investigated when a program has crashed "in the field".

Here is a simple program containing an invalid memory access bug, which we will use to produce a core file:

int foo (int *p);  int main (void) {   int *p = 0;   /* null pointer */   return foo (p); }  int foo (int *p) {   int y = *p;   return y; } 

The program attempts to dereference a null pointer p, which is an invalid operation. On most systems, this will cause a crash. (15)

In order to be able to find the cause of the crash later, we will need to compile the program with the -g option:

$ gcc -Wall -g null.c 

Note that a null pointer will only cause a problem at run-time, so the option -Wall does not produce any warnings.

Running the executable file on an x86 GNU/Linux system will cause the operating system to terminate the program abnormally:

$ ./a.out  Segmentation fault (core dumped) 

Whenever the error message ‘core dumped’ is displayed, the operating system should produce a file called ‘core’ in the current directory.(16) This core file contains a complete copy of the pages of memory used by the program at the time it was terminated. Incidentally, the term segmentation fault refers to the fact that the program tried to access a restricted memory "segment" outside the area of memory which had been allocated to it.

Some systems are configured not to write core files by default, since the files can be large and rapidly fill up the available disk space on a system. In the GNU Bashshell the command ulimit -c controls the maximum size of core files. If the size limit is zero, no core files are produced. The current size limit can be shown by typing the following command:

$ ulimit -c 0 

If the result is zero, as shown above, then it can be increased with the following command to allow core files of any size to be written:(17)

$ ulimit -c unlimited 

Note that this setting only applies to the current shell. To set the limit for future sessions the command should be placed in an appropriate login file, such as‘.bash_profile’ for the GNU Bash shell.

Core files can be loaded into the GNU Debugger gdb with the following command:

$ gdb EXECUTABLE-FILE CORE-FILE 

Note that both the original executable file and the core file are required for debugging--it is not possible to debug a core file without the corresponding executable. In this example, we can load the executable and core file with the command:

$ gdb a.out core 

The debugger immediately begins printing diagnostic information, and shows a listing of the line where the program crashed (line 13):

$ gdb a.out core Core was generated by `./a.out'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0  0x080483ed in foo (p=0x0) at null.c:13 13        int y = *p; (gdb) 

The final line (gdb) is the GNU Debugger prompt--it indicates that further commands can be entered at this point.

To investigate the cause of the crash, we display the value of the pointer p using the debugger print command:

(gdb) print p $1 = (int *) 0x0 

This shows that p is a null pointer (0x0) of type ‘int *’, so we know that dereferencing it with the expression *p in this line has caused the crash.


Displaying a backtrace

The debugger can also show the function calls and arguments up to the current point of execution--this is called a stack backtrace and is displayed with the commandbacktrace:

(gdb) backtrace #0  0x080483ed in foo (p=0x0) at null.c:13 #1  0x080483d9 in main () at null.c:7 

In this case, the backtrace shows that the crash occurred at line 13 after the function foo was called from main with an argument of p=0x0 at line 7 in ‘null.c’. It is possible to move to different levels in the stack trace, and examine their variables, using the debugger commands up and down.

Setting a breakpoint

A breakpoint stops the execution of a program and returns control to the debugger, where its variables and memory can be examined before continuing. Breakpoints can be set for specific functions, lines or memory locations with the break command.

To set a breakpoint on a specific function, use the command break function-name. For example, the following command sets a breakpoint at the start of the mainfunction in the program above:

$ gdb a.out (gdb) break main Breakpoint 1 at 0x80483c6: file null.c, line 6. 

The debugger will now take control of the program when the function main is called. Since the main function is the first function to be executed in a C program the program will stop immediately when it is run:

(gdb) run Starting program: a.out  Breakpoint 1, main () at null.c:6 6         int *p = 0;   /* null pointer */ (gdb) 

The display shows the line that will be executed next (the line number is shown on the left). The breakpoint stops the program before the line is executed, so at this stage the pointer p is undefined and has not yet been set to zero.


Stepping through the program

To move forward and execute the line displayed above, use the command step:

(gdb) step 7         return foo (p); 

After executing line 6, the debugger displays the next line to be executed. The pointer p will now have been set to zero (null):

(gdb) print p $1 = (int *) 0x0 

The command step will follow the execution of the program interactively through any functions that are called in the current line. If you want to move forward without tracing these calls, use the command next instead.


Modifying variables

To temporarily fix the null pointer bug discovered above, we can change the value of p in the running program using the set variable command.

Variables can be set to a specific value, or to the result of an expression, which may include function calls. This powerful feature allows functions in a program to be tested interactively through the debugger.

In this case we will interactively allocate some memory for the pointer p using the function malloc, storing the value 255 in the resulting location:

(gdb) set variable p = malloc(sizeof(int)) (gdb) print p $2 = (int *) 0x40013f98    (address allocated by malloc) (gdb) set variable *p = 255 (gdb) print *p $3 = 255 

If we now continue stepping through the program with the new value of p the previous segmentation fault will not occur:

(gdb) step foo (p=0x40013f98) at null.c:13 13        int y = *p; (gdb) step 14        return y;


Continuing execution

The command finish continues execution up to the end of the current function, displaying the return value:

(gdb) finish Run till exit from #0  0x08048400 in foo (p=0x40013f98)  at null.c:15 0x080483d9 in main () at null.c:7 7         return foo (p); Value returned is $13 = 255 

To continue execution until the program exits (or hits the next breakpoint) use the command continue,

(gdb) continue Continuing. Program exited with code 0377. 

Note that the exit code is shown in octal (0377 base 8 = 255 in base 10).

No comments:

Post a Comment