Examining core files
In addition to allowing programs to be run under the debugger, an important benefit of the -g
option is the ability to examine the cause of a program crash from a "core dump".
When a program exits abnormally (i.e. crashes) the operating system can write out a core file (usually named ‘core’) which contains the in-memory state of the program at the time it crashed. This file is often referred to as a core dump.(14) Combined with information from the symbol table produced by -g
, the core dump can be used to find the line where the program stopped, and the values of its variables at that point.
This is useful both during the development of software and after deployment--it allows problems to be investigated when a program has crashed "in the field".
Here is a simple program containing an invalid memory access bug, which we will use to produce a core file:
int foo (int *p); int main (void) { int *p = 0; /* null pointer */ return foo (p); } int foo (int *p) { int y = *p; return y; }
The program attempts to dereference a null pointer p
, which is an invalid operation. On most systems, this will cause a crash. (15)
In order to be able to find the cause of the crash later, we will need to compile the program with the -g
option:
$ gcc -Wall -g null.c
Note that a null pointer will only cause a problem at run-time, so the option -Wall
does not produce any warnings.
Running the executable file on an x86 GNU/Linux system will cause the operating system to terminate the program abnormally:
$ ./a.out Segmentation fault (core dumped)
Whenever the error message ‘core dumped’ is displayed, the operating system should produce a file called ‘core’ in the current directory.(16) This core file contains a complete copy of the pages of memory used by the program at the time it was terminated. Incidentally, the term segmentation fault refers to the fact that the program tried to access a restricted memory "segment" outside the area of memory which had been allocated to it.
Some systems are configured not to write core files by default, since the files can be large and rapidly fill up the available disk space on a system. In the GNU Bashshell the command ulimit -c
controls the maximum size of core files. If the size limit is zero, no core files are produced. The current size limit can be shown by typing the following command:
$ ulimit -c 0
If the result is zero, as shown above, then it can be increased with the following command to allow core files of any size to be written:(17)
$ ulimit -c unlimited
Note that this setting only applies to the current shell. To set the limit for future sessions the command should be placed in an appropriate login file, such as‘.bash_profile’ for the GNU Bash shell.
Core files can be loaded into the GNU Debugger gdb
with the following command:
$ gdb EXECUTABLE-FILE CORE-FILE
Note that both the original executable file and the core file are required for debugging--it is not possible to debug a core file without the corresponding executable. In this example, we can load the executable and core file with the command:
$ gdb a.out core
The debugger immediately begins printing diagnostic information, and shows a listing of the line where the program crashed (line 13):
$ gdb a.out core Core was generated by `./a.out'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x080483ed in foo (p=0x0) at null.c:13 13 int y = *p; (gdb)
The final line (gdb)
is the GNU Debugger prompt--it indicates that further commands can be entered at this point.
To investigate the cause of the crash, we display the value of the pointer p
using the debugger print
command:
(gdb) print p $1 = (int *) 0x0
This shows that p
is a null pointer (0x0
) of type ‘int *’, so we know that dereferencing it with the expression *p
in this line has caused the crash.
Displaying a backtrace
The debugger can also show the function calls and arguments up to the current point of execution--this is called a stack backtrace and is displayed with the commandbacktrace
:
(gdb) backtrace #0 0x080483ed in foo (p=0x0) at null.c:13 #1 0x080483d9 in main () at null.c:7
In this case, the backtrace shows that the crash occurred at line 13 after the function foo
was called from main
with an argument of p=0x0
at line 7 in ‘null.c’. It is possible to move to different levels in the stack trace, and examine their variables, using the debugger commands up
and down
.
Setting a breakpoint
A breakpoint stops the execution of a program and returns control to the debugger, where its variables and memory can be examined before continuing. Breakpoints can be set for specific functions, lines or memory locations with the break
command.
To set a breakpoint on a specific function, use the command break function-name
. For example, the following command sets a breakpoint at the start of the main
function in the program above:
$ gdb a.out (gdb) break main Breakpoint 1 at 0x80483c6: file null.c, line 6.
The debugger will now take control of the program when the function main
is called. Since the main
function is the first function to be executed in a C program the program will stop immediately when it is run:
(gdb) run Starting program: a.out Breakpoint 1, main () at null.c:6 6 int *p = 0; /* null pointer */ (gdb)
The display shows the line that will be executed next (the line number is shown on the left). The breakpoint stops the program before the line is executed, so at this stage the pointer p
is undefined and has not yet been set to zero.
Stepping through the program
To move forward and execute the line displayed above, use the command step
:
(gdb) step 7 return foo (p);
After executing line 6, the debugger displays the next line to be executed. The pointer p
will now have been set to zero (null):
(gdb) print p $1 = (int *) 0x0
The command step
will follow the execution of the program interactively through any functions that are called in the current line. If you want to move forward without tracing these calls, use the command next
instead.
Modifying variables
To temporarily fix the null pointer bug discovered above, we can change the value of p
in the running program using the set variable
command.
Variables can be set to a specific value, or to the result of an expression, which may include function calls. This powerful feature allows functions in a program to be tested interactively through the debugger.
In this case we will interactively allocate some memory for the pointer p
using the function malloc
, storing the value 255 in the resulting location:
(gdb) set variable p = malloc(sizeof(int)) (gdb) print p $2 = (int *) 0x40013f98 (address allocated by malloc
) (gdb) set variable *p = 255 (gdb) print *p $3 = 255
If we now continue stepping through the program with the new value of p
the previous segmentation fault will not occur:
(gdb) step foo (p=0x40013f98) at null.c:13 13 int y = *p; (gdb) step 14 return y;
Continuing execution
The command finish
continues execution up to the end of the current function, displaying the return value:
(gdb) finish Run till exit from #0 0x08048400 in foo (p=0x40013f98) at null.c:15 0x080483d9 in main () at null.c:7 7 return foo (p); Value returned is $13 = 255
To continue execution until the program exits (or hits the next breakpoint) use the command continue
,
(gdb) continue Continuing. Program exited with code 0377.
Note that the exit code is shown in octal (0377 base 8 = 255 in base 10).
No comments:
Post a Comment