Now that the idea of programming is less abstract, there are a few other important concepts to know about C. Assembly language and computer processors existed before higher-level programming languages, and many modern programming concepts have evolved through time. In the same way that knowing a little about Latin can greatly improve one's understanding of the English language, knowledge of low-level programming concepts can assist the comprehension of higher-level ones. When continuing to the next section, remember that C code must be compiled into machine instructions before it can do anything.
The value "Hello, world!\n"
passed
to the printf()
function in the
previous program is a string—technically, a character array. In C, an array is simply a list of n elements of a specific data type. A
20-character array is simply 20 adjacent characters located in
memory. Arrays are
also referred to as buffers.
The char_array.c
program is an example of a character array.
#include <stdio.h> int main() { char str_a[20]; str_a[0] = 'H'; str_a[1] = 'e'; str_a[2] = 'l'; str_a[3] = 'l'; str_a[4] = 'o'; str_a[5] = ','; str_a[6] = ' '; str_a[7] = 'w'; str_a[8] = 'o'; str_a[9] = 'r'; str_a[10] = 'l'; str_a[11] = 'd'; str_a[12] = '!'; str_a[13] = '\n'; str_a[14] = 0; printf(str_a); }
The GCC compiler can also be given the -o
switch to define the output file to compile to.
This switch is used below to compile the program into an executable
binary called char_array
.
reader@hacking:~/booksrc $ gcc -o char_array char_array.c reader@hacking:~/booksrc $ ./char_array Hello, world! reader@hacking:~/booksrc $
In the preceding program, a 20-element character array is
defined as str_a
, and each element of
the array is written to, one by one. Notice that the number begins
at 0, as opposed to 1. Also notice that the last character is a 0.
(This is also called a null
byte.) The character array was defined, so 20 bytes are
allocated for it, but only 12 of these bytes are actually used. The
null byte at the end is used as a delimiter character to tell any
function that is dealing with the string to stop operations right
there. The remaining extra bytes are just garbage and will be
ignored. If a null
byte is inserted in the fifth element of the character array, only
the characters Hello
would be printed
by the printf()
function.
Since setting each character in a character array is painstaking
and strings are used
fairly often, a set of standard functions was created for string manipulation. For
example, the strcpy()
function will
copy a string from a source to a destination, iterating through the
source string and copying each byte to the destination (and
stopping after it copies the null termination byte). The order of
the function's arguments is similar to Intel assembly syntax:
destination first and then source. The char_array.c program can be
rewritten using strcpy()
to accomplish
the same thing using the string library. The next version of the
char_array program shown below includes string.h since it uses a string
function.
#include <stdio.h> #include <string.h> int main() { char str_a[20]; strcpy(str_a, "Hello, world!\n"); printf(str_a); }
Let's take a look at this program with GDB. In the output below,
the compiled program is opened with GDB and breakpoints are set before, in, and after the
strcpy()
call shown in bold. The
debugger will pause the program at each breakpoint, giving us a
chance to examine registers and memory. The strcpy()
function's code comes from a shared
library, so the breakpoint in this function can't actually be set
until the program is executed.
reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2.c
reader@hacking:~/booksrc $ gdb -q ./char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1 #include <stdio.h>
2 #include <string.h>
3
4 int main() {
5 char str_a[20];
6
7 strcpy(str_a, "Hello, world!\n");
8 printf(str_a);
9 }
(gdb) break 6
Breakpoint 1 at 0x80483c4: file char_array2.c, line 6.
(gdb) break strcpy
Function "strcpy" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending.
(gdb) break 8
Breakpoint 3 at 0x80483d7: file char_array2.c, line 8.
(gdb)
When the program is run, the strcpy()
breakpoint is resolved. At each
breakpoint, we're going to look at EIP and the instructions it points to.
Notice that the memory location for EIP at the middle breakpoint is
different.
(gdb) run
Starting program: /home/reader/booksrc/char_array2
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved
Breakpoint 1, main () at char_array2.c:7
7 strcpy(str_a, "Hello, world!\n");
(gdb) i r eip
eip 0x80483c4 0x80483c4 <main+16>
(gdb) x/5i $eip
0x80483c4 <main+16>: mov DWORD PTR [esp+4],0x80484c4
0x80483cc <main+24>: lea eax,[ebp-40]
0x80483cf <main+27>: mov DWORD PTR [esp],eax
0x80483d2 <main+30>: call 0x80482c4 <strcpy@plt>
0x80483d7 <main+35>: lea eax,[ebp-40]
(gdb) continue
Continuing.
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) i r eip
eip 0xb7f076f4 0xb7f076f4 <strcpy+4>
(gdb) x/5i $eip
0xb7f076f4 <strcpy+4>: mov esi,DWORD PTR [ebp+8]
0xb7f076f7 <strcpy+7>: mov eax,DWORD PTR [ebp+12]
0xb7f076fa <strcpy+10>: mov ecx,esi
0xb7f076fc <strcpy+12>: sub ecx,eax
0xb7f076fe <strcpy+14>: mov edx,eax
(gdb) continue
Continuing.
Breakpoint 3, main () at char_array2.c:8
8 printf(str_a);
(gdb) i r eip
eip 0x80483d7 0x80483d7 <main+35>
(gdb) x/5i $eip
0x80483d7 <main+35>: lea eax,[ebp-40]
0x80483da <main+38>: mov DWORD PTR [esp],eax
0x80483dd <main+41>: call 0x80482d4 <printf@plt>
0x80483e2 <main+46>: leave
0x80483e3 <main+47>: ret
(gdb)
The address in EIP at the middle breakpoint is different because
the code for the strcpy()
function
comes from a loaded library. In fact, the debugger shows EIP for
the middle breakpoint in the strcpy()
function, while EIP at the other two breakpoints is in the
main()
function. I'd like to point out
that EIP is able to travel from the main code to the strcpy()
code and back again. Each time a function
is called, a record is kept on a data structure simply called the
stack. The stack lets EIP
return through long chains of function calls. In GDB, the
bt
command can be used to backtrace
the stack. In the output below, the stack backtrace is shown at
each breakpoint.
(gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/reader/booksrc/char_array2 Error in re-setting breakpoint 4: Function "strcpy" not defined. Breakpoint 1, main () at char_array2.c:7 7 strcpy(str_a, "Hello, world!\n"); (gdb) bt #0 main () at char_array2.c:7 (gdb) cont Continuing. Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6 (gdb) bt #0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6 #1 0x080483d7 in main () at char_array2.c:7 (gdb) cont Continuing. Breakpoint 3, main () at char_array2.c:8 8 printf(str_a); (gdb) bt #0 main () at char_array2.c:8 (gdb)
At the middle breakpoint, the backtrace of the stack shows its
record of the strcpy()
call. Also, you
may notice that the strcpy()
function
is at a slightly different address during the second run. This is
due to an exploit protection method that is turned on by default in
the Linux kernel since 2.6.11. We will talk about this protection
in more detail later.
By default, numerical values in C are signed, which means they can be both negative and positive. In contrast, unsigned values don't allow negative numbers. Since it's all just memory in the end, all numerical values must be stored in binary, and unsigned values make the most sense in binary. A 32-bit unsigned integer can contain values from 0 (all binary 0s) to 4,294,967,295 (all binary 1s). A 32-bit signed integer is still just 32 bits, which means it can only be in one of 232 possible bit combinations. This allows 32-bit signed integers to range from –2,147,483,648 to 2,147,483,647. Essentially, one of the bits is a flag marking the value positive or negative. Positively signed values look the same as unsigned values, but negative numbers are stored differently using a method called two's complement. Two's complement represents negative numbers in a form suited for binary adders—when a negative value in two's complement is added to a positive number of the same magnitude, the result will be 0. This is done by first writing the positive number in binary, then inverting all the bits, and finally adding 1. It sounds strange, but it works and allows negative numbers to be added in combination with positive numbers using simple binary adders.
This can be explored quickly on a smaller scale using
pcalc
, a simple programmer's
calculator that displays results in decimal, hexadecimal, and
binary formats. For simplicity's sake, 8-bit numbers are used in
this example.
reader@hacking:~/booksrc $ pcalc 0y01001001 73 0x49 0y1001001 reader@hacking:~/booksrc $ pcalc 0y10110110 + 1 183 0xb7 0y10110111 reader@hacking:~/booksrc $ pcalc 0y01001001 + 0y10110111 256 0x100 0y100000000 reader@hacking:~/booksrc $
First, the binary value 01001001 is shown to be positive 73.
Then all the bits are flipped, and 1 is added to result in the
two's complement representation for negative 73, 10110111. When
these two values are added together, the result of the original 8
bits is 0. The program pcalc
shows the
value 256 because it's not aware that we're only dealing with 8-bit
values. In a binary adder, that carry bit would just be thrown away
because the end of the variable's memory would have been reached.
This example might shed some light on how two's complement works
its magic.
In C, variables can be declared as unsigned by simply prepending
the keyword unsigned
to the
declaration. An unsigned integer would be declared with
unsigned int
. In addition, the size of
numerical variables
can be extended or shortened by adding the keywords long
or short
. The
actual sizes will vary depending on the architecture the code is
compiled for. The language of C provides a macro called
sizeof()
that can determine the size
of certain data types. This works like a function that takes a data
type as its input and returns the size of a variable declared with
that data type for the target architecture. The datatype_sizes.c program
explores the sizes of various data types, using the sizeof()
function.
#include <stdio.h> int main() { printf("The 'int' data type is\t\t %d bytes\n", sizeof(int)); printf("The 'unsigned int' data type is\t %d bytes\n", sizeof(unsigned int)); printf("The 'short int' data type is\t %d bytes\n", sizeof(short int)); printf("The 'long int' data type is\t %d bytes\n", sizeof(long int)); printf("The 'long long int' data type is %d bytes\n", sizeof(long long int)); printf("The 'float' data type is\t %d bytes\n", sizeof(float)); printf("The 'char' data type is\t\t %d bytes\n", sizeof(char)); }
This piece of code uses the printf()
function in a slightly different way. It
uses something called a format specifier to display the value
returned from the sizeof()
function
calls. Format specifiers will be explained in depth later, so for
now, let's just focus on the program's output.
reader@hacking:~/booksrc $ gcc datatype_sizes.c reader@hacking:~/booksrc $ ./a.out The 'int' data type is 4 bytes The 'unsigned int' data type is 4 bytes The 'short int' data type is 2 bytes The 'long int' data type is 4 bytes The 'long long int' data type is 8 bytes The 'float' data type is 4 bytes The 'char' data type is 1 bytes reader@hacking:~/booksrc $
As previously stated, both signed and unsigned integers are four
bytes in size on the x86
architecture. A float is also four bytes, while a char only needs a
single byte. The long
and short
keywords can also be used with
floating-point variables to extend and shorten their sizes.
The EIP register is a pointer that "points" to the current instruction during a program's execution by containing its memory address. The idea of pointers is used in C, also. Since the physical memory cannot actually be moved, the information in it must be copied. It can be very computationally expensive to copy large chunks of memory to be used by different functions or in different places. This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied. Pointers are a solution to this problem. Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory.
Pointers in C can be defined and used like any other variable
type. Since memory on the x86 architecture uses 32-bit addressing,
pointers are also 32 bits in size (4 bytes). Pointers are defined
by prepending an asterisk (*) to the variable name. Instead of defining a
variable of that type, a pointer is defined as something that
points to data of that type. The pointer.c program is an example of
a pointer being used with the char
data type, which is only 1 byte in size.
#include <stdio.h> #include <string.h> int main() { char str_a[20]; // A 20-element character array char *pointer; // A pointer, meant for a character array char *pointer2; // And yet another one strcpy(str_a, "Hello, world!\n"); pointer = str_a; // Set the first pointer to the start of the array. printf(pointer); pointer2 = pointer + 2; // Set the second one 2 bytes further in. printf(pointer2); // Print it. strcpy(pointer2, "y you guys!\n"); // Copy into that spot. printf(pointer); // Print again. }
As the comments in the code indicate, the first pointer is set
at the beginning of the character array. When the character array
is referenced like this, it is actually a pointer itself. This is
how this buffer was passed as a pointer to the printf()
and strcpy()
functions earlier. The second pointer is set to the first pointer's
address plus two, and then some things are printed (shown in the
output below).
reader@hacking:~/booksrc $ gcc -o pointer pointer.c reader@hacking:~/booksrc $ ./pointer Hello, world! llo, world! Hey you guys! reader@hacking:~/booksrc $
Let's take a look at this with GDB. The program is recompiled,
and a breakpoint is set on the tenth line of the source code. This
will stop the program after the "Hello,
world!\n"
string has been copied into the str_a
buffer and the pointer variable is set to the
beginning of it.
reader@hacking:~/booksrc $ gcc -g -o pointer pointer.c reader@hacking:~/booksrc $ gdb -q ./pointer Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". (gdb) list 1 #include <stdio.h> 2 #include <string.h> 3 4 int main() { 5 char str_a[20]; // A 20-element character array 6 char *pointer; // A pointer, meant for a character array 7 char *pointer2; // And yet another one 8 9 strcpy(str_a, "Hello, world!\n"); 10 pointer = str_a; // Set the first pointer to the start of the array. (gdb) 11 printf(pointer); 12 13 pointer2 = pointer + 2; // Set the second one 2 bytes further in. 14 printf(pointer2); // Print it. 15 strcpy(pointer2, "y you guys!\n"); // Copy into that spot. 16 printf(pointer); // Print again. 17 } (gdb) break 11 Breakpoint 1 at 0x80483dd: file pointer.c, line 11. (gdb) run Starting program: /home/reader/booksrc/pointer Breakpoint 1, main () at pointer.c:11 11 printf(pointer); (gdb) x/xw pointer 0xbffff7e0: 0x6c6c6548 (gdb) x/s pointer 0xbffff7e0: "Hello, world!\n" (gdb)
When the pointer is examined as a string, it's apparent that the
given string is there and is located at memory address 0xbffff7e0
. Remember that the string itself isn't
stored in the pointer variable—only the memory address 0xbffff7e0
is stored there.
In order to see the actual data stored in the pointer variable, you must use the address-of operator. The address-of operator is a unary operator, which simply means it operates on a single argument. This operator is just an ampersand (&) prepended to a variable name. When it's used, the address of that variable is returned, instead of the variable itself. This operator exists both in GDB and in the C programming language.
(gdb) x/xw &pointer 0xbffff7dc: 0xbffff7e0 (gdb) print &pointer $1 = (char **) 0xbffff7dc (gdb) print pointer $2 = 0xbffff7e0 "Hello, world!\n" (gdb)
When the address-of
operator is used, the pointer variable is shown to be located at
the address 0xbffff7dc
in memory, and
it contains the address 0xbffff7e0
.
The address-of operator is often used in conjunction with pointers, since pointers contain memory addresses. The addressof.c program demonstrates the address-of operator being used to put the address of an integer variable into a pointer. This line is shown in bold below.
#include <stdio.h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // put the address of int_var into int_ptr
}
The program itself doesn't actually output anything, but you can probably guess what happens, even before debugging with GDB.
reader@hacking:~/booksrc $ gcc -g addressof.c reader@hacking:~/booksrc $ gdb -q ./a.out Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". (gdb) list 1 #include <stdio.h> 2 3 int main() { 4 int int_var = 5; 5 int *int_ptr; 6 7 int_ptr = &int_var; // Put the address of int_var into int_ptr. 8 } (gdb) break 8 Breakpoint 1 at 0x8048361: file addressof.c, line 8. (gdb) run Starting program: /home/reader/booksrc/a.out Breakpoint 1, main () at addressof.c:8 8 } (gdb) print int_var $1 = 5 (gdb) print &int_var $2 = (int *) 0xbffff804 (gdb) print int_ptr $3 = (int *) 0xbffff804 (gdb) print &int_ptr $4 = (int **) 0xbffff800 (gdb)
As usual, a breakpoint is set and the program is executed in the
debugger. At this point the majority of the program has executed.
The first print
command shows the
value of int_var
, and the second shows
its address using the address-of operator. The next two print
commands show that int_ptr
contains
the address of int_var
, and they also
show the address of the int_ptr
for
good measure.
An additional unary operator called the dereference operator exists for use with
pointers. This
operator will return the data found in the address the pointer is
pointing to, instead of the address itself. It takes the form of an
asterisk in front of the variable name, similar to the declaration
of a pointer. Once again, the dereference operator exists both in GDB and in C.
Used in GDB, it can retrieve the integer value int_ptr
points to.
(gdb) print *int_ptr $5 = 5
A few additions to the addressof.c code (shown in addressof2.c) will
demonstrate all of these concepts. The added printf()
functions use format parameters, which
I'll explain in the next section. For now, just focus on the
program's output.
#include <stdio.h> int main() { int int_var = 5; int *int_ptr; int_ptr = &int_var; // Put the address of int_var into int_ptr. printf("int_ptr = 0x%08x\n", int_ptr); printf("&int_ptr = 0x%08x\n", &int_ptr); printf("*int_ptr = 0x%08x\n\n", *int_ptr); printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var); printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n", &int_ptr, int_ptr, *int_ptr); }
The results of compiling and executing addressof2.c are as follows.
reader@hacking:~/booksrc $ gcc addressof2.c reader@hacking:~/booksrc $ ./a.out int_ptr = 0xbffff834 &int_ptr = 0xbffff830 *int_ptr = 0x00000005 int_var is located at 0xbffff834 and contains 5 int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5 reader@hacking:~/booksrc $
When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing.
The printf()
function can be used
to print more than just fixed strings. This function can also use
format strings to
print variables in many different formats. A format string is just a character string
with special escape
sequences that tell the function to insert variables printed in a
specific format in place of the escape sequence. The way the
printf()
function has been used in the
previous programs, the "Hello,
world!\n"
string technically is the format string; however,
it is devoid of special escape sequences. These escape sequences are also called
format parameters, and for
each one found in the format string, the function is expected to
take an additional argument. Each format parameter begins with a
percent sign (%) and uses a
single-character shorthand very similar to formatting characters
used by GDB's examine command.
Parameter |
Output Type |
---|---|
|
Decimal |
|
Unsigned decimal |
|
Hexadecimal |
All of the preceding format parameters receive their data as values, not pointers to values. There are also some format parameters that expect pointers, such as the following.
Parameter |
Output Type |
---|---|
|
String |
|
Number of bytes written so far |
The %s
format parameter expects to
be given a memory address; it prints the data at that memory
address until a null byte is encountered. The %n
format parameter is unique in that it actually
writes data. It also expects to be given a memory address, and it
writes the number of bytes that have been written so far into that
memory address.
For now, our focus will just be the format parameters used for displaying data. The fmt_strings.c program shows some examples of different format parameters.
#include <stdio.h> int main() { char string[10]; int A = -73; unsigned int B = 31337; strcpy(string, "sample"); // Example of printing with different format string printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A); printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B); printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B); printf("[string] %s Address %08x\n", string, string); // Example of unary address operator (dereferencing) and a %x format string printf("variable A is at address: %08x\n", &A); }
In the preceding code, additional variable arguments are passed
to each printf()
call for every format
parameter in the format string. The final printf()
call uses the argument A
, which will provide the address of the variable
A
. The program's compilation and
execution are as follows.
reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings.c reader@hacking:~/booksrc $ ./fmt_strings [A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223 [B] Dec: 31337, Hex: 7a69, Unsigned: 31337 [field width on B] 3: '31337', 10: ' 31337', '00031337' [string] sample Address bffff870 variable A is at address: bffff86c reader@hacking:~/booksrc $
The first two calls to printf()
demonstrate the printing of variables A
and B
, using
different format parameters. Since there are three format
parameters in each line, the variables A
and B
need to be
supplied three times each. The %d
format parameter allows for negative values, while %u
does not, since it is expecting unsigned
values.
When the variable A
is printed
using the %u
format parameter, it
appears as a very high value. This is because A
is a negative number stored in two's complement, and the
format parameter is trying to print it as if it were an unsigned
value. Since two's complement flips all the bits and adds one, the
very high bits that used to be zero are now one.
The third line in the example, labeled [field width on B]
, shows the use of the field-width option in a
format parameter. This is just an integer that designates the
minimum field width for that format parameter. However, this is not
a maximum field width—if the value to be outputted is greater than
the field width, the field width will be exceeded. This happens
when 3 is used, since the output data needs 5 bytes. When 10 is
used as the field width, 5 bytes of blank space are outputted
before the output data. Additionally, if a field width value begins
with a 0, this means the field should be padded with zeros. When 08
is used, for example, the output is 00031337.
The fourth line, labeled [string]
,
simply shows the use of the %s
format
parameter. Remember that the variable string is actually a pointer
containing the address of the string, which works out wonderfully,
since the %s
format parameter expects
its data to be passed by reference.
The final line just shows the address of the variable
A
, using the unary address operator to
dereference the
variable. This value is displayed as eight hexadecimal digits,
padded by zeros.
As these examples show, you should use %d
for decimal, %u
for
unsigned, and %x
for hexadecimal
values. Minimum field widths can be set by putting a number right
after the percent sign, and if the field width begins with 0, it
will be padded with zeros. The %s
parameter can be used to print strings and should be passed the
address of the string. So far, so good.
Format strings are
used by an entire family of standard I/O functions, including
scanf()
, which basically works like
printf()
but is used for input instead
of output. One key difference is that the scanf()
function expects all of its arguments to
be pointers, so the arguments must actually be variable
addresses—not the variables themselves. This can be done using
pointer variables or by using the unary address operator to
retrieve the address of the normal variables. The input.c program and
execution should help explain.
#include <stdio.h> #include <string.h> int main() { char message[10]; int count, i; strcpy(message, "Hello, world!"); printf("Repeat how many times? "); scanf("%d", &count); for(i=0; i < count; i++) printf("%3d - %s\n", i, message); }
In input.c, the scanf()
function is
used to set the count
variable. The
output below demonstrates its use.
reader@hacking:~/booksrc $ gcc -o input input.c reader@hacking:~/booksrc $ ./input Repeat how many times? 3 0 - Hello, world! 1 - Hello, world! 2 - Hello, world! reader@hacking:~/booksrc $ ./input Repeat how many times? 12 0 - Hello, world! 1 - Hello, world! 2 - Hello, world! 3 - Hello, world! 4 - Hello, world! 5 - Hello, world! 6 - Hello, world! 7 - Hello, world! 8 - Hello, world! 9 - Hello, world! 10 - Hello, world! 11 - Hello, world! reader@hacking:~/booksrc $
Format strings are used quite often, so familiarity with them is valuable. In addition, the ability to output the values of variables allows for debugging in the program, without the use of a debugger. Having some form of immediate feedback is fairly vital to the hacker's learning process, and something as simple as printing the value of a variable can allow for lots of exploitation.
Typecasting is simply a way to temporarily change a variable's data type, despite how it was originally defined. When a variable is typecast into a different type, the compiler is basically told to treat that variable as if it were the new data type, but only for that operation. The syntax for typecasting is as follows:
(typecast_data_type) variable
This can be used when dealing with integers and floating-point variables, as typecasting.c demonstrates.
#include <stdio.h> int main() { int a, b; float c, d; a = 13; b = 5; c = a / b; // Divide using integers. d = (float) a / (float) b; // Divide integers typecast as floats. printf("[integers]\t a = %d\t b = %d\n", a, b); printf("[floats]\t c = %f\t d = %f\n", c, d); }
The results of compiling and executing typecasting.c are as follows.
reader@hacking:~/booksrc $ gcc typecasting.c reader@hacking:~/booksrc $ ./a.out [integers] a = 13 b = 5 [floats] c = 2.000000 d = 2.600000 reader@hacking:~/booksrc $
As discussed earlier, dividing the integer 13 by 5 will round down to the incorrect answer of 2, even if this value is being stored into a floating-point variable. However, if these integer variables are typecast into floats, they will be treated as such. This allows for the correct calculation of 2.6.
This example is illustrative, but where typecasting really shines is when it is used
with pointer variables. Even though a pointer is just a memory
address, the C compiler still demands a data type for every
pointer. One reason for this is to try to limit programming errors.
An integer pointer should only point to integer data, while a
character pointer should only point to character data. Another
reason is for pointer
arithmetic. An integer is four bytes in size, while a character
only takes up a single byte. The pointer_types.c program will demonstrate and
explain these concepts further. This code uses the format parameter
%p
to output memory addresses. This is
shorthand meant for displaying pointers and is basically equivalent
to 0x%08x
.
#include <stdio.h> int main() { int i; char char_array[5] = {'a', 'b', 'c', 'd', 'e'}; int int_array[5] = {1, 2, 3, 4, 5}; char *char_pointer; int *int_pointer; char_pointer = char_array; int_pointer = int_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer); int_pointer = int_pointer + 1; } for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer. printf("[char pointer] points to %p, which contains the char '%c'\n", char_pointer, *char_pointer); char_pointer = char_pointer + 1; } }
In this code two arrays are defined in memory—one containing
integer data and the other containing character data. Two pointers
are also defined, one with the integer data type and one with the
character data type, and they are set to point at the start of the
corresponding data arrays. Two separate for loops iterate through
the arrays using pointer arithmetic to adjust the pointer to point
at the next value. In the loops, when the integer and character
values are actually printed with the %d
and %c
format
parameters, notice that the corresponding printf()
arguments must dereference the pointer variables. This is
done using the unary * operator and has been marked above in
bold.
reader@hacking:~/booksrc $ gcc pointer_types.c reader@hacking:~/booksrc $ ./a.out [integer pointer] points to 0xbffff7f0, which contains the integer 1 [integer pointer] points to 0xbffff7f4, which contains the integer 2 [integer pointer] points to 0xbffff7f8, which contains the integer 3 [integer pointer] points to 0xbffff7fc, which contains the integer 4 [integer pointer] points to 0xbffff800, which contains the integer 5 [char pointer] points to 0xbffff810, which contains the char 'a' [char pointer] points to 0xbffff811, which contains the char 'b' [char pointer] points to 0xbffff812, which contains the char 'c' [char pointer] points to 0xbffff813, which contains the char 'd' [char pointer] points to 0xbffff814, which contains the char 'e' reader@hacking:~/booksrc $
Even though the same value of 1 is added to int_pointer
and char_pointer
in their respective loops, the
compiler increments the pointer's addresses by different amounts.
Since a char is only 1 byte, the pointer to the next char would
naturally also be 1 byte over. But since an integer is 4 bytes, a
pointer to the next integer has to be 4 bytes over.
In pointer_types2.c, the pointers are juxtaposed such
that the int_pointer
points to the
character data and vice versa. The major changes to the code are
marked in bold.
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = int_array; // The char_pointer and int_pointer now
int_pointer = char_array; // point to incompatible data types.
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
The output below shows the warnings spewed forth from the compiler.
reader@hacking:~/booksrc $ gcc pointer_types2.c pointer_types2.c: In function `main': pointer_types2.c:12: warning: assignment from incompatible pointer type pointer_types2.c:13: warning: assignment from incompatible pointer type reader@hacking:~/booksrc $
In an attempt to prevent programming mistakes, the compiler gives warnings about pointers that point to incompatible data types. But the compiler and perhaps the programmer are the only ones that care about a pointer's type. In the compiled code, a pointer is nothing more than a memory address, so the compiler will still compile the code if a pointer points to an incompatible data type—it simply warns the programmer to anticipate unexpected results.
reader@hacking:~/booksrc $ ./a.out [integer pointer] points to 0xbffff810, which contains the char 'a' [integer pointer] points to 0xbffff814, which contains the char 'e' [integer pointer] points to 0xbffff818, which contains the char '8' [integer pointer] points to 0xbffff81c, which contains the char ' [integer pointer] points to 0xbffff820, which contains the char '?' [char pointer] points to 0xbffff7f0, which contains the integer 1 [char pointer] points to 0xbffff7f1, which contains the integer 0 [char pointer] points to 0xbffff7f2, which contains the integer 0 [char pointer] points to 0xbffff7f3, which contains the integer 0 [char pointer] points to 0xbffff7f4, which contains the integer 2 reader@hacking:~/booksrc $
Even though the int_pointer
points
to character data that only contains 5 bytes of data, it is still
typed as an integer. This means that adding 1 to the pointer will
increment the address by 4 each time. Similarly, the char_pointer
's address is only incremented by 1
each time, stepping through the 20 bytes of integer data (five
4-byte integers), one byte at a time. Once again, the littleendian
byte order of the integer data is apparent when the 4-byte integer
is examined one byte at a time. The 4-byte value of 0x00000001
is actually stored in memory as
0x01, 0x00, 0x00, 0x00
.
There will be situations like this in which you are using a pointer that points to data with a conflicting type. Since the pointer type determines the size of the data it points to, it's important that the type is correct. As you can see in pointer_types3.c below, typecasting is just a way to change the type of a variable on the fly.
#include <stdio.h> int main() { int i; char char_array[5] = {'a', 'b', 'c', 'd', 'e'}; int int_array[5] = {1, 2, 3, 4, 5}; char *char_pointer; int *int_pointer; char_pointer = (char *) int_array; // Typecast into the int_pointer = (int *) char_array; // pointer's data type. for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[integer pointer] points to %p, which contains the char '%c'\n", int_pointer, *int_pointer);int_pointer = (int *) ((char *) int_pointer + 1);
} for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer. printf("[char pointer] points to %p, which contains the integer %d\n", char_pointer, *char_pointer);char_pointer = (char *) ((int *) char_pointer + 1);
} }
In this code, when the pointers are initially set, the data is typecast into the pointer's data type. This will prevent the C compiler from complaining about the conflicting data types; however, any pointer arithmetic will still be incorrect. To fix that, when 1 is added to the pointers, they must first be typecast into the correct data type so the address is incremented by the correct amount. Then this pointer needs to be typecast back into the pointer's data type once again. It doesn't look too pretty, but it works.
reader@hacking:~/booksrc $ gcc pointer_types3.c reader@hacking:~/booksrc $ ./a.out [integer pointer] points to 0xbffff810, which contains the char 'a' [integer pointer] points to 0xbffff811, which contains the char 'b' [integer pointer] points to 0xbffff812, which contains the char 'c' [integer pointer] points to 0xbffff813, which contains the char 'd' [integer pointer] points to 0xbffff814, which contains the char 'e' [char pointer] points to 0xbffff7f0, which contains the integer 1 [char pointer] points to 0xbffff7f4, which contains the integer 2 [char pointer] points to 0xbffff7f8, which contains the integer 3 [char pointer] points to 0xbffff7fc, which contains the integer 4 [char pointer] points to 0xbffff800, which contains the integer 5 reader@hacking:~/booksrc $
Naturally, it is far easier just to use the correct data type
for pointers in the first place; however, sometimes a generic,
typeless pointer is desired. In C, a void pointer is a typeless pointer, defined by the
void
keyword. Experimenting with
void pointers quickly
reveals a few things about typeless pointers. First, pointers cannot be
de-referenced unless they have a type. In order to retrieve the
value stored in the pointer's memory address, the compiler must
first know what type of data it is. Secondly, void pointers must
also be typecast before doing pointer arithmetic. These are fairly
intuitive limitations, which means that a void pointer's main
purpose is to simply hold a memory address.
The pointer_types3.c program can be modified to use a single void pointer by typecasting it to the proper type each time it's used. The compiler knows that a void pointer is typeless, so any type of pointer can be stored in a void pointer without typecasting. This also means a void pointer must always be typecast when dereferencing it, however. These differences can be seen in pointer_types4.c, which uses a void pointer.
#include <stdio.h> int main() { int i; char char_array[5] = {'a', 'b', 'c', 'd', 'e'}; int int_array[5] = {1, 2, 3, 4, 5}; void *void_pointer; void_pointer = (void *) char_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[char pointer] points to %p, which contains the char '%c'\n", void_pointer, *((char *) void_pointer)); void_pointer = (void *) ((char *) void_pointer + 1); } void_pointer = (void *) int_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[integer pointer] points to %p, which contains the integer %d\n", void_pointer, *((int *) void_pointer)); void_pointer = (void *) ((int *) void_pointer + 1); } }
The results of compiling and executing pointer_types4.c are as follows.
reader@hacking:~/booksrc $ gcc pointer_types4.c reader@hacking:~/booksrc $ ./a.out [char pointer] points to 0xbffff810, which contains the char 'a' [char pointer] points to 0xbffff811, which contains the char 'b' [char pointer] points to 0xbffff812, which contains the char 'c' [char pointer] points to 0xbffff813, which contains the char 'd' [char pointer] points to 0xbffff814, which contains the char 'e' [integer pointer] points to 0xbffff7f0, which contains the integer 1 [integer pointer] points to 0xbffff7f4, which contains the integer 2 [integer pointer] points to 0xbffff7f8, which contains the integer 3 [integer pointer] points to 0xbffff7fc, which contains the integer 4 [integer pointer] points to 0xbffff800, which contains the integer 5 reader@hacking:~/booksrc $
The compilation and output of this pointer_types4.c is basically the same as that for pointer_types3.c. The void pointer is really just holding the memory addresses, while the hard-coded typecasting is telling the compiler to use the proper types whenever the pointer is used.
Since the type is taken care of by the typecasts, the void pointer is truly nothing more than a memory address. With the data types defined by typecasting, anything that is big enough to hold a four-byte value can work the same way as a void pointer. In pointer_types5.c, an unsigned integer is used to store this address.
#include <stdio.h> int main() { int i; char char_array[5] = {'a', 'b', 'c', 'd', 'e'}; int int_array[5] = {1, 2, 3, 4, 5}; unsigned int hacky_nonpointer; hacky_nonpointer = (unsigned int) char_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[hacky_nonpointer] points to %p, which contains the char '%c'\n", hacky_nonpointer, *((char *) hacky_nonpointer));hacky_nonpointer = hacky_nonpointer + sizeof(char);
} hacky_nonpointer = (unsigned int) int_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer. printf("[hacky_nonpointer] points to %p, which contains the integer %d\n", hacky_nonpointer, *((int *) hacky_nonpointer));hacky_nonpointer = hacky_nonpointer + sizeof(int);
} }
This is rather hacky, but since this integer value is typecast
into the proper pointer types when it is assigned and
de-referenced, the end result is the same. Notice that instead of
typecasting multiple
times to do pointer arithmetic on an unsigned integer (which isn't
even a pointer), the sizeof()
function
is used to achieve the same result using normal arithmetic.
reader@hacking:~/booksrc $ gcc pointer_types5.c reader@hacking:~/booksrc $ ./a.out [hacky_nonpointer] points to 0xbffff810, which contains the char 'a' [hacky_nonpointer] points to 0xbffff811, which contains the char 'b' [hacky_nonpointer] points to 0xbffff812, which contains the char 'c' [hacky_nonpointer] points to 0xbffff813, which contains the char 'd' [hacky_nonpointer] points to 0xbffff814, which contains the char 'e' [hacky_nonpointer] points to 0xbffff7f0, which contains the integer 1 [hacky_nonpointer] points to 0xbffff7f4, which contains the integer 2 [hacky_nonpointer] points to 0xbffff7f8, which contains the integer 3 [hacky_nonpointer] points to 0xbffff7fc, which contains the integer 4 [hacky_nonpointer] points to 0xbffff800, which contains the integer 5 reader@hacking:~/booksrc $
The important thing to remember about variables in C is that the compiler is the only thing that cares about a variable's type. In the end, after the program has been compiled, the variables are nothing more than memory addresses. This means that variables of one type can easily be coerced into behaving like another type by telling the compiler to typecast them into the desired type.
Many nongraphical programs receive input in the form of command-line arguments.
Unlike inputting with scanf()
,
command-line arguments don't require user interaction after the
program has begun execution. This tends to be more efficient and is
a useful input method.
In C, command-line arguments can be accessed in the main()
function by including two additional
arguments to the function: an integer and a pointer to an array of
strings. The integer will contain the number of arguments, and the
array of strings will contain each of those arguments. The commandline.c program and
its execution should explain things.
#include <stdio.h> int main(int arg_count, char *arg_list[]) { int i; printf("There were %d arguments provided:\n", arg_count); for(i=0; i < arg_count; i++) printf("argument #%d\t-\t%s\n", i, arg_list[i]); } reader@hacking:~/booksrc $ gcc -o commandline commandline.c reader@hacking:~/booksrc $ ./commandline There were 1 arguments provided: argument #0 - ./commandline reader@hacking:~/booksrc $ ./commandline this is a test There were 5 arguments provided: argument #0 - ./commandline argument #1 - this argument #2 - is argument #3 - a argument #4 - test reader@hacking:~/booksrc $
The zeroth argument is always the name of the executing binary, and the rest of the argument array (often called an argument vector) contains the remaining arguments as strings.
Sometimes a program will want to use a command-line argument as an integer as
opposed to a string. Regardless of this, the argument is passed in
as a string; however, there are standard conversion functions. Unlike simple
typecasting, these functions can actually convert character arrays
containing numbers into actual integers. The most common of these functions is
atoi()
, which is short for
ASCII to integer. This
function accepts a pointer to a string as its argument and returns
the integer value it represents. Observe its usage in convert.c.
#include <stdio.h> void usage(char *program_name) { printf("Usage: %s <message> <# of times to repeat>\n", program_name); exit(1); } int main(int argc, char *argv[]) { int i, count; if(argc < 3) // If fewer than 3 arguments are used, usage(argv[0]); // display usage message and exit. count = atoi(argv[2]); // Convert the 2nd arg into an integer. printf("Repeating %d times..\n", count); for(i=0; i < count; i++) printf("%3d - %s\n", i, argv[1]); // Print the 1st arg. }
The results of compiling and executing convert.c are as follows.
reader@hacking:~/booksrc $ gcc convert.c reader@hacking:~/booksrc $ ./a.out Usage: ./a.out <message> <# of times to repeat> reader@hacking:~/booksrc $ ./a.out 'Hello, world!' 3 Repeating 3 times.. 0 - Hello, world! 1 - Hello, world! 2 - Hello, world! reader@hacking:~/booksrc $
In the preceding code, an if
statement makes sure that three arguments are used before these
strings are accessed. If the program tries to access memory that
doesn't exist or that the program doesn't have permission to read,
the program will crash. In C it's important to check for these
types of conditions and handle them in program logic. If the
error-checking if
statement is
commented out, this memory violation can be explored. The convert2.c program
should make this more clear.
#include <stdio.h> void usage(char *program_name) { printf("Usage: %s <message> <# of times to repeat>\n", program_name); exit(1); } int main(int argc, char *argv[]) { int i, count; // if(argc < 3) // If fewer than 3 arguments are used, // usage(argv[0]); // display usage message and exit. count = atoi(argv[2]); // Convert the 2nd arg into an integer. printf("Repeating %d times..\n", count); for(i=0; i < count; i++) printf("%3d - %s\n", i, argv[1]); // Print the 1st arg. }
The results of compiling and executing convert2.c are as follows.
reader@hacking:~/booksrc $ gcc convert2.c reader@hacking:~/booksrc $ ./a.out test Segmentation fault (core dumped) reader@hacking:~/booksrc $
When the program isn't given enough command-line arguments, it still tries to access elements of the argument array, even though they don't exist. This results in the program crashing due to a segmentation fault.
Memory is split into segments (which will be discussed later), and some memory addresses aren't within the boundaries of the memory segments the program is given access to. When the program attempts to access an address that is out of bounds, it will crash and die in what's called a segmentation fault. This effect can be explored further with GDB.
reader@hacking:~/booksrc $ gcc -g convert2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run test
Starting program: /home/reader/booksrc/a.out test
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2.c, line 14.
(gdb) run test
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/a.out test
Breakpoint 1, main (argc=2, argv=0xbffff894
) at convert2.c:14
14 count = atoi(argv[2]); // convert the 2nd arg into an integer
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) x/3xw 0xbffff894
0xbffff894: 0xbffff9b3 0xbffff9ce 0x00000000
(gdb) x/s 0xbffff9b3
0xbffff9b3: "/home/reader/booksrc/a.out"
(gdb) x/s 0xbffff9ce
0xbffff9ce: "test"
(gdb) x/s 0x00000000
0x0: <Address 0x0 out of bounds>
(gdb) quit
The program is running. Exit anyway? (y or n) y
reader@hacking:~/booksrc $
The program is executed with a single command-line argument of test
within GDB, which causes the program to
crash. The
where
command will sometimes show a
useful backtrace of the stack; however, in this case, the stack was
too badly mangled in the crash. A breakpoint is set on main and the
program is re-executed to get the value of the argument vector
(shown in bold). Since the argument vector is a pointer to list of
strings, it is actually a pointer to a list of pointers. Using the
command x/3xw
to examine the first
three memory addresses stored at the argument vector's address
shows that they are themselves pointers to strings. The first one
is the zeroth argument, the second is the test
argument, and the third is zero, which is out
of bounds. When the program tries to access this memory address, it
crashes with a segmentation fault.
Another interesting concept regarding memory in C is variable
scoping or context—in particular, the contexts of variables within
functions. Each function has its own set of local variables, which are independent of
everything else. In fact, multiple calls to the same function all
have their own contexts. You can use the printf()
function with format strings to quickly
explore this;check it out in scope.c.
#include <stdio.h> void func3() { int i = 11; printf("\t\t\t[in func3] i = %d\n", i); } void func2() { int i = 7; printf("\t\t[in func2] i = %d\n", i); func3(); printf("\t\t[back in func2] i = %d\n", i); } void func1() { int i = 5; printf("\t[in func1] i = %d\n", i); func2(); printf("\t[back in func1] i = %d\n", i); } int main() { int i = 3; printf("[in main] i = %d\n", i); func1(); printf("[back in main] i = %d\n", i); }
The output of this simple program demonstrates nested function calls.
reader@hacking:~/booksrc $ gcc scope.c reader@hacking:~/booksrc $ ./a.out [in main] i = 3 [in func1] i = 5 [in func2] i = 7 [in func3] i = 11 [back in func2] i = 7 [back in func1] i = 5 [back in main] i = 3 reader@hacking:~/booksrc $
In each function, the variable i
is
set to a different value and printed. Notice that within the
main()
function, the variable
i
is 3, even after calling
func1()
where the variable
i
is 5. Similarly, within func1()
the variable i
remains 5, even after calling func2()
where i
is 7,
and so forth. The best way to think of this is that each function
call has its own version of the variable i
.
Variables can also have a global scope, which means they will persist across all
functions. Variables are global if they are defined at the
beginning of the code, outside of any functions. In the scope2.c example code shown
below, the variable j
is declared
globally and set to 42. This variable can be read from and written
to by any function, and the changes to it will persist between
functions.
#include <stdio.h> int j = 42; // j is a global variable. void func3() { int i = 11, j = 999; // Here, j is a local variable of func3(). printf("\t\t\t[in func3] i = %d, j = %d\n", i, j); } void func2() { int i = 7; printf("\t\t[in func2] i = %d, j = %d\n", i, j); printf("\t\t[in func2] setting j = 1337\n"); j = 1337; // Writing to j func3(); printf("\t\t[back in func2] i = %d, j = %d\n", i, j); } void func1() { int i = 5; printf("\t[in func1] i = %d, j = %d\n", i, j); func2(); printf("\t[back in func1] i = %d, j = %d\n", i, j); } int main() { int i = 3; printf("[in main] i = %d, j = %d\n", i, j); func1(); printf("[back in main] i = %d, j = %d\n", i, j); }
The results of compiling and executing scope2.c are as follows.
reader@hacking:~/booksrc $ gcc scope2.c reader@hacking:~/booksrc $ ./a.out [in main] i = 3, j = 42 [in func1] i = 5, j = 42 [in func2] i = 7, j = 42 [in func2] setting j = 1337 [in func3] i = 11, j = 999 [back in func2] i = 7, j = 1337 [back in func1] i = 5, j = 1337 [back in main] i = 3, j = 1337 reader@hacking:~/booksrc $
In the output, the global variable j
is
written to in func2()
, and the change
persists in all functions except func3()
, which has its own local variable called
j
. In this case, the compiler prefers
to use the local variable. With all these variables using the same
names, it can be a little confusing, but remember that in the end,
it's all just memory. The global variable j
is just stored in memory, and every function is
able to access that memory. The local variables for each function
are each stored in their own places in memory, regardless of the
identical names. Printing the memory addresses of these variables
will give a clearer picture of what's going on. In the scope3.c example code below,
the variable addresses are printed using the unary address-of
operator.
#include <stdio.h> int j = 42; // j is a global variable. void func3() { int i = 11, j = 999; // Here, j is a local variable of func3(). printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i); printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j); } void func2() { int i = 7; printf("\t\t[in func2] i @ 0x%08x = %d\n", &i, i); printf("\t\t[in func2] j @ 0x%08x = %d\n", &j, j); printf("\t\t[in func2] setting j = 1337\n"); j = 1337; // Writing to j func3(); printf("\t\t[back in func2] i @ 0x%08x = %d\n", &i, i); printf("\t\t[back in func2] j @ 0x%08x = %d\n", &j, j); } void func1() { int i = 5; printf("\t[in func1] i @ 0x%08x = %d\n", &i, i); printf("\t[in func1] j @ 0x%08x = %d\n", &j, j); func2(); printf("\t[back in func1] i @ 0x%08x = %d\n", &i, i); printf("\t[back in func1] j @ 0x%08x = %d\n", &j, j); } int main() { int i = 3; printf("[in main] i @ 0x%08x = %d\n", &i, i); printf("[in main] j @ 0x%08x = %d\n", &j, j); func1(); printf("[back in main] i @ 0x%08x = %d\n", &i, i); printf("[back in main] j @ 0x%08x = %d\n", &j, j); }
The results of compiling and executing scope3.c are as follows.
reader@hacking:~/booksrc $ gcc scope3.c reader@hacking:~/booksrc $ ./a.out [in main] i @ 0xbffff834 = 3 [in main] j @ 0x08049988 = 42 [in func1] i @ 0xbffff814 = 5 [in func1] j @ 0x08049988 = 42 [in func2] i @ 0xbffff7f4 = 7 [in func2] j @ 0x08049988 = 42 [in func2] setting j = 1337 [in func3] i @ 0xbffff7d4 = 11 [in func3] j @ 0xbffff7d0 = 999 [back in func2] i @ 0xbffff7f4 = 7 [back in func2] j @ 0x08049988 = 1337 [back in func1] i @ 0xbffff814 = 5 [back in func1] j @ 0x08049988 = 1337 [back in main] i @ 0xbffff834 = 3 [back in main] j @ 0x08049988 = 1337 reader@hacking:~/booksrc $
In this output, it is obvious that the variable j
used by func3()
is
different than the j
used by the other
functions. The j
used by func3()
is located at 0xbffff7d0
, while the j
used by the other functions is located at
0x08049988
. Also, notice that the
variable i
is actually a different
memory address for each function.
In the following output, GDB is used to stop execution at a
breakpoint in func3()
. Then the
backtrace command shows the record of each function call on the
stack.
reader@hacking:~/booksrc $ gcc -g scope3.c reader@hacking:~/booksrc $ gdb -q ./a.out Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". (gdb) list 1 1 #include <stdio.h> 2 3 int j = 42; // j is a global variable. 4 5 void func3() { 6 int i = 11, j = 999; // Here, j is a local variable of func3(). 7 printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i); 8 printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j); 9 } 10 (gdb) break 7 Breakpoint 1 at 0x8048388: file scope3.c, line 7. (gdb) run Starting program: /home/reader/booksrc/a.out [in main] i @ 0xbffff804 = 3 [in main] j @ 0x08049988 = 42 [in func1] i @ 0xbffff7e4 = 5 [in func1] j @ 0x08049988 = 42 [in func2] i @ 0xbffff7c4 = 7 [in func2] j @ 0x08049988 = 42 [in func2] setting j = 1337 Breakpoint 1, func3 () at scope3.c:7 7 printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i); (gdb) bt #0 func3 () at scope3.c:7 #1 0x0804841d in func2 () at scope3.c:17 #2 0x0804849f in func1 () at scope3.c:26 #3 0x0804852b in main () at scope3.c:35 (gdb)
The backtrace also shows the nested function calls by looking at records kept on the stack. Each time a function is called, a record called a stack frame is put on the stack. Each line in the backtrace corresponds to a stack frame. Each stack frame also contains the local variables for that context. The local variables contained in each stack frame can be shown in GDB by adding the word full to the backtrace command.
(gdb) bt full #0 func3 () at scope3.c:7 i = 11 j = 999 #1 0x0804841d in func2 () at scope3.c:17 i = 7 #2 0x0804849f in func1 () at scope3.c:26 i = 5 #3 0x0804852b in main () at scope3.c:35 i = 3 (gdb)
The full backtrace clearly shows that the local variable
j
only exists in func3()
's context. The global version of the variable j
is used in the other function's contexts.
In addition to globals, variables can also be defined as static variables by
prepending the keyword static
to the
variable definition. Similar to global variables, a static variable remains intact between
function calls; however, static variables are also akin to local
variables since they remain local within a particular function
context. One different and unique feature of static variables is
that they are only initialized once. The code in static.c will help
explain these concepts.
#include <stdio.h> void function() { // An example function, with its own context int var = 5; static int static_var = 5; // Static variable initialization printf("\t[in function] var = %d\n", var); printf("\t[in function] static_var = %d\n", static_var); var++; // Add one to var. static_var++; // Add one to static_var. } int main() { // The main function, with its own context int i; static int static_var = 1337; // Another static, in a different context for(i=0; i < 5; i++) { // Loop 5 times. printf("[in main] static_var = %d\n", static_var); function(); // Call the function. } }
The aptly named static_var
is
defined as a static variable in two places: within the context
of main()
and within the context of function()
. Since static variables are local within a particular
functional context, these variables can have the same name, but
they actually represent two different locations in memory. The
function simply prints the values of the two variables in its
context and then adds 1 to both of them. Compiling and executing
this code will show the difference between the static and nonstatic
variables.
reader@hacking:~/booksrc $ gcc static.c reader@hacking:~/booksrc $ ./a.out [in main] static_var = 1337 [in function] var = 5 [in function] static_var = 5 [in main] static_var = 1337 [in function] var = 5 [in function] static_var = 6 [in main] static_var = 1337 [in function] var = 5 [in function] static_var = 7 [in main] static_var = 1337 [in function] var = 5 [in function] static_var = 8 [in main] static_var = 1337 [in function] var = 5 [in function] static_var = 9 reader@hacking:~/booksrc $
Notice that the static_var
retains
its value between subsequent calls to function()
. This is because static variables retain their values, but
also because they are only initialized once. In addition, since the
static variables are local to a particular functional context, the
static_var
in the context of main()
retains its value of 1337 the entire
time.
Once again, printing the addresses of these variables by dereferencing them with the unary address operator will provide greater viability into what's really going on. Take a look at static2.c for an example.
#include <stdio.h> void function() { // An example function, with its own context int var = 5; static int static_var = 5; // Static variable initialization printf("\t[in function] var @ %p = %d\n", &var, var); printf("\t[in function] static_var @ %p = %d\n", &static_var, static_var); var++; // Add 1 to var. static_var++; // Add 1 to static_var. } int main() { // The main function, with its own context int i; static int static_var = 1337; // Another static, in a different context for(i=0; i < 5; i++) { // loop 5 times printf("[in main] static_var @ %p = %d\n", &static_var, static_var); function(); // Call the function. } }
The results of compiling and executing static2.c are as follows.
reader@hacking:~/booksrc $ gcc static2.c reader@hacking:~/booksrc $ ./a.out [in main] static_var @ 0x804968c = 1337 [in function] var @ 0xbffff814 = 5 [in function] static_var @ 0x8049688 = 5 [in main] static_var @ 0x804968c = 1337 [in function] var @ 0xbffff814 = 5 [in function] static_var @ 0x8049688 = 6 [in main] static_var @ 0x804968c = 1337 [in function] var @ 0xbffff814 = 5 [in function] static_var @ 0x8049688 = 7 [in main] static_var @ 0x804968c = 1337 [in function] var @ 0xbffff814 = 5 [in function] static_var @ 0x8049688 = 8 [in main] static_var @ 0x804968c = 1337 [in function] var @ 0xbffff814 = 5 [in function] static_var @ 0x8049688 = 9 reader@hacking:~/booksrc $
With the addresses of the variables displayed, it is apparent that the
static_var
in main()
is different than the one found in
function()
, since they are located at
different memory
addresses (0x804968c
and 0x8049688
, respectively). You may have noticed
that the addresses of the local variables all have very high
addresses, like 0xbffff814
, while the
global and static
variables all have very low memory addresses, like 0x0804968c
and 0x8049688
. That's very astute of you—noticing
details like this and asking why is one of the cornerstones of
hacking. Read on for
your answers.