O
O
OSCP Notes
Search
⌃K

Buffer Overflow

Intro to Intel x86-64 Assembly Language

Machine code is usually represented by a more readable form of the code called assembly code. This machine is code is usually produced by a compiler, which takes the source code of a file, and after going through some intermediate stages, produces machine code that can be executed by a computer., Intel first started out by building 16-bit instruction set, followed by 32 bit, after which they finally created 64 bit. All these instruction sets have been created for backward compatibility, so code compiled for 32 bit architecture will run on 64 bit machines.
before an executable file is produced, the source code is first compiled into assembly(.s files), after which the assembler converts it into an object program(.o files), and operations with a linker finally make it an executable.
Source code > compiled into assembly (.s) -> assembler converts it into an object program (.o) ->
oeperations with a linker makes it an executable.

Assembly with radare2

radare2 is a framework for reverse engineering and analyzing binaries
r2 -d intro debugging mode
(intro is an example program)
2. Once it opens the binary, let's use "aa" to analyze all symbols and entry points in the program.
3. Set the disassembly syntax
e asm.syntax=att
4. typing "?" will open help views "a?" will open help view for analysis.
5. Once the analysis is complete, you would want to know where to start analysing from - most programs have an entry point defined as main. To find a list of the functions run:
afl
6. Examine the individual functions: pdf @main
(pdf= print disaasembly function)
The core of assembly language involves using registers to do the following:
  • Transfer data between memory and register, and vice versa
  • Perform arithmetic operations on registers and data
  • Transfer control to other parts of the program
Since the architecture is x86-64, the registers are 64 bit and Intel has a list of 16 registers:
64 bit
32 bit
%rax
%eax
%rbx
%ebx
%rcx
%ecx
%rdx
%edx
%rsi
%esi
%rdi
%edi
%rsp
%esp
%rbp
%ebp
%r8
%r8d
%r9
%r9d
%r10
%r10d
%r11
%r11d
%r12
%r12d
%r13
%r13d
%r14
%r14d
%r15
%r15d
Even though the registers are 64 bit, meaning they can hold up to 64 bits of data, other parts of the registers can also be referenced. In this case, registers can also be referenced as 32 bit values as shown. What isn’t shown is that registers can be referenced as 16 bit and 8 bit(higher 4 bit and lower 4 bit).
The first 6 registers are known as general purpose registers. The %rsp is the stack pointer and it points to the top of the stack which contains the most recent memory address. The stack is a data structure that manages memory for programs. %rbp is a frame pointer and points to the frame of the function currently being executed - every function is executed in a new frame. To move data using registers, the following instruction is used
:movq source, destination
This involves:
  • Transferring constants(which are prefixed using the $ operator) e.g. movq $3 rax would move the constant 3 to the register
  • Transferring values from a register e.g. movq %rax %rbx which involves moving value from rax to rbx
  • Transferring values from memory which is shown by putting registers inside brackets e.g. movq %rax (%rbx) which means move value stored in %rax to memory location represented by %rbx.
The last letter of the mov instruction represents the size of the data:
Intel Data Type
Suffix
Size(bytes)
Byte
b
1
Word
w
2
Double Word
l
4
Quad Word
q
8
Single Precision
s
4
Double Precision
l
8
When dealing with memory manipulation using registers, there are other cases to be considered:
  • (Rb, Ri) = MemoryLocation[Rb + Ri]
  • D(Rb, Ri) = MemoryLocation[Rb + Ri + D]
  • (Rb, Ri, S) = MemoryLocation(Rb + S * Ri]
  • D(Rb, Ri, S) = MemoryLocation[Rb + S * Ri + D]
Some other important instructions are:
  • leaq source, destination: this instruction sets destination to the address denoted by the expression in source
  • addq source, destination: destination = destination + source
  • subq source, destination: destination = destination - source
  • imulqsource, destination: destination = destination * source
  • salq source, destination: destination = destination << source where << is the left bit shifting operator
  • sarq source, destination: destination = destination >> source where >> is the right bit shifting operator
  • xorq source, destination: destination = destination XOR source
  • andq source, destination: destination = destination & source
  • orq source, destination: destination = destination | source
The general format of an if statement is
if(condition){
do-stuff-here
}``else if(condition) //this is an optional condition{
do-stuff-here
}else {
do-stuff-here
}
If statements use 3 important instructions in assembly:
  • cmpq source2, source1: it is like computing a-b without setting destination
  • testq source2, source1: it is like computing a&b without setting destination
Jump instructions are used to transfer control to different instructions, and there are different types of jumps:
Jump Type
Description
jmp
Unconditional
je
Equal/Zero
jne
Not Equal/Not Zero
js
Negative
jns
Nonnegative
jg
Greater
jge
Greater or Equal
jl
Less
jle
Less or Equal
ja
Above(unsigned)
jb
Below(unsigned)

If statement

set a breakpoint at jge instruction and jmp instruction
db HEX_CODE
We’ve added breakpoints to stop the execution of the program at those points so we can see the state of the program. Doing so will show the following:
2. Run "dc" to start execution of the program
What happened before we hit the breakpoint?
  • The first 2 lines are about pushing the frame pointer onto the stack and saving it(this is about how functions are called, and will be examined later)
  • The next 3 lines are about assigning values 3 and 4 to the local arguments/variables var_8h and var_4h. It then stores the value in var_8h in the %eax register.
  • The cmplinstruction compares the value of eax with that of the var_8h argument
3. Typing "dr" will show the value of the registers.
We can see that the value of rax, which is the 64 bit version of eax contains 3. We saw that the jge instruction is jumping based on whether value of eax is greater than var_4h. To see what’s in var_4h, we can see that at top of the main function, it tells us the position of var_4h. Run the command: px @rbp-0x4 And that shows the value of 4.
We know that eax contains 3, and 3 is not greater than 4, so the jump will not execute. Instead it will move to the next instruction. To check this, run the ds command which seeks/moves onto the next instruction.
Now it moved onto the next command which is to add 5 to the var_8h
check the var_8 value by typing "px @rbp-0x8"
We can see that it inclemented by 5!
The next instruction is an unconditional jump and it just jumps to clearing the eax register. The popqinstruction involves popping a value of the stack and reading it, and the return instruction sets this popped value to the current instruction pointer. In this case, it shows the execution of the program has been completed. To understand better about how an if statement work, you can check the corresponding C file in the same folder.
if2 practice
1. r2 -d if2 to open it
2. aaa for debugger mode and e asm.syntax=att to set a dissenbly syntaxt for at&t
3. afl to list the functions (found the main!)
4. pdf @main to analyze the function
5. Set the breakpoint right before at the jmp: db HEX and jge
6. dc to run the program.
7. ds to move to the next instruction and check the current execution via @main
8. Then at the zero offset after runningpx @rbp-0x8, the value is 0x60, or 96.

Loop

Usually two types of loops are used: for loops and while loops. The general format of a while loops is:
while(condition){
Do-stuff-here
Change value used in condition
}
The general format of a for loop is
for(initialise value: condition; change value used in condition){
do-stuff-here
}
dwperuc3sv
0x00400567
670540000x