New Concepts Covered
- Loops
In this example, we will be analyzing a program that takes in user input and prints the length of the input.
Source Code
#include <stdio.h>
int main()
{
char input[32];
/* This is just an example, don't actually do this.
* it is vulnerable to buffer overflow. */
scanf("%s", input);
int i = 0;
for(char * c = input; *c != '\0'; c++) {
i++;
}
printf("Your input is %d characters long\n", i);
return 0;
}
Disassembly
Dump of assembler code for function main:
0x0804846b <+0>: push ebp
0x0804846c <+1>: mov ebp,esp
0x0804846e <+3>: sub esp,0x28
0x08048471 <+6>: lea eax,[ebp-0x28]
0x08048474 <+9>: push eax
0x08048475 <+10>: push 0x8048540
0x0804847a <+15>: call 0x8048350 <__isoc99_scanf@plt>
0x0804847f <+20>: add esp,0x8
0x08048482 <+23>: mov DWORD PTR [ebp-0x4],0x0
0x08048489 <+30>: lea eax,[ebp-0x28]
0x0804848c <+33>: mov DWORD PTR [ebp-0x8],eax
0x0804848f <+36>: jmp 0x8048499 <main+46>
0x08048491 <+38>: add DWORD PTR [ebp-0x4],0x1
0x08048495 <+42>: add DWORD PTR [ebp-0x8],0x1
0x08048499 <+46>: mov eax,DWORD PTR [ebp-0x8]
0x0804849c <+49>: movzx eax,BYTE PTR [eax]
0x0804849f <+52>: test al,al
0x080484a1 <+54>: jne 0x8048491 <main+38>
0x080484a3 <+56>: push DWORD PTR [ebp-0x4]
0x080484a6 <+59>: push 0x8048544
0x080484ab <+64>: call 0x8048330 <printf@plt>
0x080484b0 <+69>: add esp,0x8
0x080484b3 <+72>: mov eax,0x0
0x080484b8 <+77>: leave
0x080484b9 <+78>: ret
End of assembler dump.
IDA Pro Graph View
As usual, let’s start off by translating the first few lines of disassembly.
0x0804846b <+0>: push ebp
0x0804846c <+1>: mov ebp,esp
0x0804846e <+3>: sub esp,0x28
0x08048471 <+6>: lea eax,[ebp-0x28]
0x08048474 <+9>: push eax
0x08048475 <+10>: push 0x8048540
0x0804847a <+15>: call 0x8048350 <__isoc99_scanf@plt>
0x0804847f <+20>: add esp,0x8
int main()
{
/* You can actually infer the size of x by looking at
the next two lines of disassembly. You see offsets 0x8
and 0x4 being used for variables. So you can guess that
char x[] is of size 0x28 - 0x8 = 0x20 = 32 */
char x[32];
scanf("%s", x);
}
Loops
Now if we refer to the graph view given earlier, we will observe that then next portion of code seems to be setting up a loop. We can observe this from the dark blue arrow that loops back from the bottom left block.
Conceptually, there are three important features of a loop: initialization, terminating condition, and increment. This is seen from how a for
loop is constructed.
for (<initialization>; <terminating condition>; <increment>){}
As such, I look out for these three features when I reverse engineer code with loops.
initialization
0x08048482 <+23>: mov DWORD PTR [ebp-0x4],0x0
0x08048489 <+30>: lea eax,[ebp-0x28]
0x0804848c <+33>: mov DWORD PTR [ebp-0x8],eax
0x0804848f <+36>: jmp 0x8048499 <main+46>
The above snippet of code initializes the variables used in the loop. With reference to what we reversed so far, it sets ebp-0x4
to 0 and sets ebp-0x8
to the address of x
.
int y;
char * z;
for (y = 0, z = x; <terminating condition>; <increment>) {}
terminating condition
IDA Pro shows the jump instruction as
jnz
instead ofjne
, they are actually the same thing. We’ll follow the vesion from gdb (jne
) for this section.
0x08048499 <+46>: mov eax,DWORD PTR [ebp-0x8]
0x0804849c <+49>: movzx eax,BYTE PTR [eax]
0x0804849f <+52>: test al,al
0x080484a1 <+54>: jne 0x8048491 <main+38>
The above moves the value at ebp-0x8
to eax
and dereferences it as a BYTE PTR
(i.e. char *
), to get its value. After which, a test al, al
instruction is executed, followed by a jne
instruction. This is a common way of testing if a register contains a zero value. As such, we can derive the following terminating condition.
Please read the entry for the
test
instruction in the intel manual for a more detailed explaination
int y;
char * z;
for (y = 0, z = x; z != '\0'; <increment>) {}
increment
0x08048491 <+38>: add DWORD PTR [ebp-0x4],0x1
0x08048495 <+42>: add DWORD PTR [ebp-0x8],0x1
The above code just increments y
and z
by 1.
int y;
char * z;
for (y = 0, z = x; z != '\0'; y++, z++) {}
At this stage, we would have the following code reversed.
int main()
{
char x[32];
scanf("%s", x);
int y;
char * z;
for (y = 0, z = x; z != '\0'; y++, z++) {}
}
We can now reverse the rest of the disassembly.
0x080484a3 <+56>: push DWORD PTR [ebp-0x4]
0x080484a6 <+59>: push 0x8048544
0x080484ab <+64>: call 0x8048330 <printf@plt>
0x080484b0 <+69>: add esp,0x8
0x080484b3 <+72>: mov eax,0x0
0x080484b8 <+77>: leave
0x080484b9 <+78>: ret
int main()
{
char x[32];
scanf("%s", x);
int y;
char * z;
for (y = 0, z = x; *z != '\0'; y++, z++) {}
printf("Your input is %d characters long\n", y);
return 0;
}