[PoP:Rev] Reverse Engineering x86 ELF: 0x4

New Concepts Covered


In this example, we will be analyzing a program that takes in user input and prints the length of the input.

Source Code

#include <stdio.h>

int main()
{
	char input[32];

	/* This is just an example, don't actually do this.
	 * it is vulnerable to buffer overflow. */

	scanf("%s", input);

	int i = 0;
	for(char * c = input; *c != '\0'; c++) {
		i++;
	}

	printf("Your input is %d characters long\n", i);

	return 0;
}

Disassembly

Dump of assembler code for function main:
   0x0804846b <+0>:	push   ebp
   0x0804846c <+1>:	mov    ebp,esp
   0x0804846e <+3>:	sub    esp,0x28
   0x08048471 <+6>:	lea    eax,[ebp-0x28]
   0x08048474 <+9>:	push   eax
   0x08048475 <+10>:	push   0x8048540
   0x0804847a <+15>:	call   0x8048350 <__isoc99_scanf@plt>
   0x0804847f <+20>:	add    esp,0x8
   0x08048482 <+23>:	mov    DWORD PTR [ebp-0x4],0x0
   0x08048489 <+30>:	lea    eax,[ebp-0x28]
   0x0804848c <+33>:	mov    DWORD PTR [ebp-0x8],eax
   0x0804848f <+36>:	jmp    0x8048499 <main+46>
   0x08048491 <+38>:	add    DWORD PTR [ebp-0x4],0x1
   0x08048495 <+42>:	add    DWORD PTR [ebp-0x8],0x1
   0x08048499 <+46>:	mov    eax,DWORD PTR [ebp-0x8]
   0x0804849c <+49>:	movzx  eax,BYTE PTR [eax]
   0x0804849f <+52>:	test   al,al
   0x080484a1 <+54>:	jne    0x8048491 <main+38>
   0x080484a3 <+56>:	push   DWORD PTR [ebp-0x4]
   0x080484a6 <+59>:	push   0x8048544
   0x080484ab <+64>:	call   0x8048330 <printf@plt>
   0x080484b0 <+69>:	add    esp,0x8
   0x080484b3 <+72>:	mov    eax,0x0
   0x080484b8 <+77>:	leave
   0x080484b9 <+78>:	ret
End of assembler dump.

IDA Pro Graph View

Main Graph View


As usual, let’s start off by translating the first few lines of disassembly.

0x0804846b <+0>:	push   ebp
0x0804846c <+1>:	mov    ebp,esp
0x0804846e <+3>:	sub    esp,0x28
0x08048471 <+6>:	lea    eax,[ebp-0x28]
0x08048474 <+9>:	push   eax
0x08048475 <+10>:	push   0x8048540
0x0804847a <+15>:	call   0x8048350 <__isoc99_scanf@plt>
0x0804847f <+20>:	add    esp,0x8
int main()
{
	/* You can actually infer the size of x by looking at
	the next two lines of disassembly. You see offsets 0x8
	and 0x4 being used for variables. So you can guess that
	char x[] is of size 0x28 - 0x8 = 0x20 = 32 */

	char x[32];

	scanf("%s", x);
}

Loops

Now if we refer to the graph view given earlier, we will observe that then next portion of code seems to be setting up a loop. We can observe this from the dark blue arrow that loops back from the bottom left block.

Conceptually, there are three important features of a loop: initialization, terminating condition, and increment. This is seen from how a for loop is constructed.

for (<initialization>; <terminating condition>; <increment>){}

As such, I look out for these three features when I reverse engineer code with loops.

Analyzing Loops

initialization

0x08048482 <+23>:	mov    DWORD PTR [ebp-0x4],0x0
0x08048489 <+30>:	lea    eax,[ebp-0x28]
0x0804848c <+33>:	mov    DWORD PTR [ebp-0x8],eax
0x0804848f <+36>:	jmp    0x8048499 <main+46>

The above snippet of code initializes the variables used in the loop. With reference to what we reversed so far, it sets ebp-0x4 to 0 and sets ebp-0x8 to the address of x.

int y;
char * z;
for (y = 0, z = x; <terminating condition>; <increment>) {}

terminating condition

IDA Pro shows the jump instruction asjnz instead of jne, they are actually the same thing. We’ll follow the vesion from gdb (jne) for this section.

0x08048499 <+46>:	mov    eax,DWORD PTR [ebp-0x8]
0x0804849c <+49>:	movzx  eax,BYTE PTR [eax]
0x0804849f <+52>:	test   al,al
0x080484a1 <+54>:	jne    0x8048491 <main+38>

The above moves the value at ebp-0x8 to eax and dereferences it as a BYTE PTR (i.e. char *), to get its value. After which, a test al, al instruction is executed, followed by a jne instruction. This is a common way of testing if a register contains a zero value. As such, we can derive the following terminating condition.

Please read the entry for the test instruction in the intel manual for a more detailed explaination

int y;
char * z;
for (y = 0, z = x; z != '\0'; <increment>) {}

increment

0x08048491 <+38>:	add    DWORD PTR [ebp-0x4],0x1
0x08048495 <+42>:	add    DWORD PTR [ebp-0x8],0x1

The above code just increments y and z by 1.

int y;
char * z;
for (y = 0, z = x; z != '\0'; y++, z++) {}

At this stage, we would have the following code reversed.

int main()
{
	char x[32];

	scanf("%s", x);

	int y;
	char * z;
	for (y = 0, z = x; z != '\0'; y++, z++) {}
}

We can now reverse the rest of the disassembly.

0x080484a3 <+56>:	push   DWORD PTR [ebp-0x4]
0x080484a6 <+59>:	push   0x8048544
0x080484ab <+64>:	call   0x8048330 <printf@plt>
0x080484b0 <+69>:	add    esp,0x8
0x080484b3 <+72>:	mov    eax,0x0
0x080484b8 <+77>:	leave
0x080484b9 <+78>:	ret
int main()
{
	char x[32];

	scanf("%s", x);

	int y;
	char * z;
	for (y = 0, z = x; *z != '\0'; y++, z++) {}

	printf("Your input is %d characters long\n", y);

	return 0;
}