New Concepts Covered
- Local Variables
- Calling Functions
In this article, we will look at the a program that reads in 3 integers from user input, feeds it into a function to add them, and prints the sum of the integers.
Source Code
#include <stdio.h>
int add(int x, int y, int z)
{
int i = 0;
/* Please do not do the following.
It's vulnerable to integer overflow. */
i = x + y + z;
return i;
}
int main()
{
int x, y, z;
printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);
printf("The sum is %d\n", add(x, y, z));
return 0;
}
Disassembly
Dump of assembler code for function main:
0x0804848d <+0>: push ebp
0x0804848e <+1>: mov ebp,esp
0x08048490 <+3>: sub esp,0xc
0x08048493 <+6>: push 0x8048570
0x08048498 <+11>: call 0x8048330 <printf@plt>
0x0804849d <+16>: add esp,0x4
0x080484a0 <+19>: lea eax,[ebp-0xc]
0x080484a3 <+22>: push eax
0x080484a4 <+23>: lea eax,[ebp-0x8]
0x080484a7 <+26>: push eax
0x080484a8 <+27>: lea eax,[ebp-0x4]
0x080484ab <+30>: push eax
0x080484ac <+31>: push 0x8048583
0x080484b1 <+36>: call 0x8048350 <__isoc99_scanf@plt>
0x080484b6 <+41>: add esp,0x10
0x080484b9 <+44>: mov ecx,DWORD PTR [ebp-0xc]
0x080484bc <+47>: mov edx,DWORD PTR [ebp-0x8]
0x080484bf <+50>: mov eax,DWORD PTR [ebp-0x4]
0x080484c2 <+53>: push ecx
0x080484c3 <+54>: push edx
0x080484c4 <+55>: push eax
0x080484c5 <+56>: call 0x804846b <add>
0x080484ca <+61>: add esp,0xc
0x080484cd <+64>: push eax
0x080484ce <+65>: push 0x804858c
0x080484d3 <+70>: call 0x8048330 <printf@plt>
0x080484d8 <+75>: add esp,0x8
0x080484db <+78>: mov eax,0x0
0x080484e0 <+83>: leave
0x080484e1 <+84>: ret
End of assembler dump.
Dump of assembler code for function add:
0x0804846b <+0>: push ebp
0x0804846c <+1>: mov ebp,esp
0x0804846e <+3>: sub esp,0x4
0x08048471 <+6>: mov DWORD PTR [ebp-0x4],0x0
0x08048478 <+13>: mov edx,DWORD PTR [ebp+0x8]
0x0804847b <+16>: mov eax,DWORD PTR [ebp+0xc]
0x0804847e <+19>: add edx,eax
0x08048480 <+21>: mov eax,DWORD PTR [ebp+0x10]
0x08048483 <+24>: add eax,edx
0x08048485 <+26>: mov DWORD PTR [ebp-0x4],eax
0x08048488 <+29>: mov eax,DWORD PTR [ebp-0x4]
0x0804848b <+32>: leave
0x0804848c <+33>: ret
End of assembler dump.
Local Variables
0x0804848d <+0>: push ebp
0x0804848e <+1>: mov ebp,esp
0x08048490 <+3>: sub esp,0xc
Similar to the previous article, the disassembly starts off with a function prologue. However, there is an additional sub
instruction in this scenario. In the above, esp
is being decremented by 0xc
, which is 12 in decimal. If we look back at the source code for main
, we see that three integers (x
, y
, z
) are initialized at the start of the function. Since integers are 4 bytes, sub esp, 0xc
is the program’s attempt at allocating space for local variables on the stack for those 3 integers.
Calling Functions
When doing reverse engineering, I generally find it useful to start off by looking for call
instructions. This is because programs are essentially a sequence of function calls. Therefore, splitting it into chunks of call
instructions would allow us to get a good idea of what is going on.
After finding each call
instruction, I would then look at the preceding lines of code to find all the corresponding push
instructions. By doing so, we are effectively identifying all the parameters passed into the forementioned function call.
Let’s use this strategy to continue our analysis of main
.
printf
0x08048493 <+6>: push 0x8048570
0x08048498 <+11>: call 0x8048330 <printf@plt>
0x0804849d <+16>: add esp,0x4
From the above call
chunk, we are able to derive the following line of code,
printf(0x8048570);
If we look at the documentation for printf
, we know that it takes in a parameter of type char *
. This would be the (format) string that will be printed. So if we go with the assumption that 0x8048570
is the address of the string to be printed and verify it using gdb
,
Note: Most disassemblers will resolve this for you by default.
gef➤ print (char *) 0x8048570
$1 = 0x8048570 "Enter 3 integers: "
we will find that it’s as we expect. Hence, we have effectively reversed the following line of code:
printf("Enter 3 integers: ");
Immedately after the call
instruction, add esp, 0x4
is called. This is done to clean up the parameters that were pushed onto the stack to perform the function call. In this scenario, we only push one parameter onto the stack (4 bytes), so esp
is incremented by 4. This cleanup is done in accordance to the cdecl calling convention whereby the caller main
, performs cleanup on the behalf of the callee printf
.
scanf
0x080484a0 <+19>: lea eax,[ebp-0xc]
0x080484a3 <+22>: push eax
0x080484a4 <+23>: lea eax,[ebp-0x8]
0x080484a7 <+26>: push eax
0x080484a8 <+27>: lea eax,[ebp-0x4]
0x080484ab <+30>: push eax
0x080484ac <+31>: push 0x8048583
0x080484b1 <+36>: call 0x8048350 <__isoc99_scanf@plt>
0x080484b6 <+41>: add esp,0x10
In this example, we see three repetitions of lea
followed by push
. If we refer to intel manual, we will see that lea
is an initialism for load effective address. In short, lea r32, m
means that the address of m
will be loaded into the 32-bit register r32
.
In this case, we are loading the address of [ebp-0xc]
into the register eax
. Note that surrounding our operand with square brackets references the value stored at that address. Therefore, lea eax, [ebp-0xc]
translates to: load the effective address of the value stored at ebp-0xc
into the 32-bit register eax
.
If we recall, a sub esp, 0xc
was done in the function prologue to make space for local variables. Currently, we are pushing the addresses of [ebp-0xc]
, [ebp-0x8]
, and [ebp-0x4]
which are our three local variables onto the stack. This is followed by 0x8048583
which is the address of the format string %d %d %d
.
More observant readers would have noticed at this point that while scanf()
expects a format string followed by the corresponding addresses to read data into, the 3 integer addresses were pushed onto the stack first, followed by the format string. This is due to cdecl, which specifies that parameters for function calls should be pushed from right to left.
This time, add esp, 0x10
is done for cleanup because we pushed 4 variables onto the stack.
Based on this information, we would have reversed even more of this program.
int x, y, z;
printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);
add
0x080484b9 <+44>: mov ecx,DWORD PTR [ebp-0xc]
0x080484bc <+47>: mov edx,DWORD PTR [ebp-0x8]
0x080484bf <+50>: mov eax,DWORD PTR [ebp-0x4]
0x080484c2 <+53>: push ecx
0x080484c3 <+54>: push edx
0x080484c4 <+55>: push eax
0x080484c5 <+56>: call 0x804846b <add>
0x080484ca <+61>: add esp,0xc
Based on what we have done so far, we can quickly translate the above to
add(x, y, z);
What we are more interested to discuss lies in the following snippet from the disassembly of the add
function.
0x08048471 <+6>: mov DWORD PTR [ebp-0x4],0x0
0x08048478 <+13>: mov edx,DWORD PTR [ebp+0x8]
0x0804847b <+16>: mov eax,DWORD PTR [ebp+0xc]
0x0804847e <+19>: add edx,eax
0x08048480 <+21>: mov eax,DWORD PTR [ebp+0x10]
0x08048483 <+24>: add eax,edx
0x08048485 <+26>: mov DWORD PTR [ebp-0x4],eax
0x08048488 <+29>: mov eax,DWORD PTR [ebp-0x4]
0x0804848b <+32>: leave
0x0804848c <+33>: ret
If we are familiar with how stack frames work, we would be able to tell that [ebp+0x8]
, [ebp+0xc]
, and [ebp+0x10]
are the three parameters passed into add
, namely x
, y
, and z
. From lines +13
to +24
, we can see that eax = x + y + z
after line +24
executes.
In the next two lines, the value in eax
is copied to the local variable ebp-0x4
and then copied back into eax
. Going back to the previous article, we know that this is because eax
will contain the return value by convention.
Based on this, we will be able to add on to what we have reversed.
int add(int x, int y, int z)
{
int i = 0;
i = x + y + z;
return i;
}
int main()
{
int x, y, z;
printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);
add(x, y z);
}
Let’s continue with the next few lines in main
,
0x080484cd <+64>: push eax
0x080484ce <+65>: push 0x804858c
0x080484d3 <+70>: call 0x8048330 <printf@plt>
0x080484d8 <+75>: add esp,0x8
Since we know that eax
contains the return value of add(x, y, z)
and 0x804858c
contains the format string The sum is %d\n
, we can derive the following code.
int add(int x, int y, int z)
{
int i = 0;
i = x + y + z;
return i;
}
int main()
{
int x, y, z;
printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);
printf("The sum is %d\n", add(x, y z));
}
and if we account for the following function epilogue,
0x080484db <+78>: mov eax,0x0
0x080484e0 <+83>: leave
0x080484e1 <+84>: ret
we would have effectively reversed the following C code from the disassembly.
int add(int x, int y, int z)
{
int i = 0;
i = x + y + z;
return i;
}
int main()
{
int x, y, z;
printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);
printf("The sum is %d\n", add(x, y z));
return 0;
}