Buffer Overflow
Machine code is usually represented by a more readable form of the code called assembly code. This machine is code is usually produced by a compiler, which takes the source code of a file, and after going through some intermediate stages, produces machine code that can be executed by a computer., Intel first started out by building 16-bit instruction set, followed by 32 bit, after which they finally created 64 bit. All these instruction sets have been created for backward compatibility, so code compiled for 32 bit architecture will run on 64 bit machines.
before an executable file is produced, the source code is first compiled into assembly(.s files), after which the assembler converts it into an object program(.o files), and operations with a linker finally make it an executable.
Source code > compiled into assembly (.s) -> assembler converts it into an object program (.o) ->
oeperations with a linker makes it an executable.
radare2 is a framework for reverse engineering and analyzing binaries
r2 -d intro
debugging mode(intro is an example program)
2. Once it opens the binary, let's use "aa" to analyze all symbols and entry points in the program.
3. Set the disassembly syntax
e asm.syntax=att
4. typing "?" will open help views "a?" will open help view for analysis.
5. Once the analysis is complete, you would want to know where to start analysing from - most programs have an entry point defined as main. To find a list of the functions run:
afl
6. Examine the individual functions:
pdf @main
(pdf= print disaasembly function)
The core of assembly language involves using registers to do the following:
- Transfer data between memory and register, and vice versa
- Perform arithmetic operations on registers and data
- Transfer control to other parts of the program
Since the architecture is x86-64, the registers are 64 bit and Intel has a list of 16 registers:
64 bit | 32 bit |
%rax | %eax |
%rbx | %ebx |
%rcx | %ecx |
%rdx | %edx |
%rsi | %esi |
%rdi | %edi |
%rsp | %esp |
%rbp | %ebp |
%r8 | %r8d |
%r9 | %r9d |
%r10 | %r10d |
%r11 | %r11d |
%r12 | %r12d |
%r13 | %r13d |
%r14 | %r14d |
%r15 | %r15d |
Even though the registers are 64 bit, meaning they can hold up to 64 bits of data, other parts of the registers can also be referenced. In this case, registers can also be referenced as 32 bit values as shown. What isn’t shown is that registers can be referenced as 16 bit and 8 bit(higher 4 bit and lower 4 bit).
The first 6 registers are known as general purpose registers. The %rsp is the stack pointer and it points to the top of the stack which contains the most recent memory address. The stack is a data structure that manages memory for programs. %rbp is a frame pointer and points to the frame of the function currently being executed - every function is executed in a new frame. To move data using registers, the following instruction is used
:movq source, destination
This involves:
- Transferring constants(which are prefixed using the $ operator) e.g.
movq $3 rax
would move the constant 3 to the register - Transferring values from a register e.g.
movq %rax %rbx
which involves moving value from rax to rbx - Transferring values from memory which is shown by putting registers inside brackets e.g.
movq %rax (%rbx)
which means move value stored in %rax to memory location represented by %rbx.
The last letter of the mov instruction represents the size of the data:
Intel Data Type | Suffix | Size(bytes) |
Byte | b | 1 |
Word | w | 2 |
Double Word | l | 4 |
Quad Word | q | 8 |
Single Precision | s | 4 |
Double Precision | l | 8 |
When dealing with memory manipulation using registers, there are other cases to be considered:
- (Rb, Ri) = MemoryLocation[Rb + Ri]
- D(Rb, Ri) = MemoryLocation[Rb + Ri + D]
- (Rb, Ri, S) = MemoryLocation(Rb + S * Ri]
- D(Rb, Ri, S) = MemoryLocation[Rb + S * Ri + D]
Some other important instructions are:
leaq source, destination
: this instruction sets destination to the address denoted by the expression in sourceaddq source, destination
: destination = destination + sourcesubq source, destination
: destination = destination - sourceimulqsource, destination
: destination = destination * sourcesalq source, destination
: destination = destination << source where << is the left bit shifting operatorsarq source, destination
: destination = destination >> source where >> is the right bit shifting operatorxorq source, destination
: destination = destination XOR sourceandq source, destination
: destination = destination & sourceorq source, destination
: destination = destination | source
The general format of an if statement is
if(condition)
{ do-stuff-here
}``else if(condition) //this is an optional condition
{ do-stuff-here
}
else { do-stuff-here
}
If statements use 3 important instructions in assembly:
cmpq source2, source1
: it is like computing a-b without setting destinationtestq source2, source1
: it is like computing a&b without setting destination
Jump instructions are used to transfer control to different instructions, and there are different types of jumps:
Jump Type | Description |
jmp | Unconditional |
je | Equal/Zero |
jne | Not Equal/Not Zero |
js | Negative |
jns | Nonnegative |
jg | Greater |
jge | Greater or Equal |
jl | Less |
jle | Less or Equal |
ja | Above(unsigned) |
jb | Below(unsigned) |
set a breakpoint at jge instruction and jmp instruction
db HEX_CODE
We’ve added breakpoints to stop the execution of the program at those points so we can see the state of the program. Doing so will show the following:
2. Run "dc" to start execution of the program
What happened before we hit the breakpoint?
- The first 2 lines are about pushing the frame pointer onto the stack and saving it(this is about how functions are called, and will be examined later)
- The next 3 lines are about assigning values 3 and 4 to the local arguments/variables var_8h and var_4h. It then stores the value in var_8h in the %eax register.
- The
cmpl
instruction compares the value of eax with that of the var_8h argument
3. Typing "dr" will show the value of the registers.
We can see that the value of rax, which is the 64 bit version of eax contains 3. We saw that the jge instruction is jumping based on whether value of eax is greater than var_4h. To see what’s in var_4h, we can see that at top of the main function, it tells us the position of var_4h. Run the command:
px @rbp-0x4 And that shows the value of 4.
We know that eax contains 3, and 3 is not greater than 4, so the jump will not execute. Instead it will move to the next instruction. To check this, run the
ds
command which seeks/moves onto the next instruction.Now it moved onto the next command which is to add 5 to the var_8h
check the var_8 value by typing "px @rbp-0x8"
We can see that it inclemented by 5!
The next instruction is an unconditional jump and it just jumps to clearing the eax register. The
popq
instruction involves popping a value of the stack and reading it, and the return instruction sets this popped value to the current instruction pointer. In this case, it shows the execution of the program has been completed. To understand better about how an if statement work, you can check the corresponding C file in the same folder.
if2 practice
1. r2 -d if2 to open it
2. aaa for debugger mode and e asm.syntax=att to set a dissenbly syntaxt for at&t
3. afl to list the functions (found the main!)
4. pdf @main to analyze the function
5. Set the breakpoint right before at the jmp: db HEX and jge
6. dc to run the program.
7. ds to move to the next instruction and check the current execution via @main
8. Then at the zero offset after running
px @rbp-0x8
, the value is 0x60, or 96.
Usually two types of loops are used: for loops and while loops. The general format of a while loops is:
while(condition){
Do-stuff-here
Change value used in condition
}
The general format of a for loop is
for(initialise value: condition; change value used in condition){
do-stuff-here
}
dwperuc3sv
0x00400567
670540000x
Last modified 1yr ago