A guy just forwarded a joke that most of us will already know Jeff Dean Facts (also here and here). Everytime I read that list, this part stands out:
Jeff Dean once implemented a web server in a single printf() call. Other engineers added thousands of lines of explanatory comments but still don’t understand exactly how it works. Today that program is the front-end to Google Search.
It is really possible to implement a web server using a single printf call, but I haven’t found anyone doing it. So this time after reading the list, I decided to implement it. So here is the code, a pure single printf call, without any extra variables or macros (don’t worry, I will explain how to this code works)
#include <stdio.h> int main(int argc, char *argv[]) { printf("%*c%hn%*c%hn" "\xeb\x3d\x48\x54\x54\x50\x2f\x31\x2e\x30\x20\x32" "\x30\x30\x0d\x0a\x43\x6f\x6e\x74\x65\x6e\x74\x2d" "\x74\x79\x70\x65\x3a\x74\x65\x78\x74\x2f\x68\x74" "\x6d\x6c\x0d\x0a\x0d\x0a\x3c\x68\x31\x3e\x48\x65" "\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x3c\x2f" "\x68\x31\x3e\x4c\x8d\x2d\xbc\xff\xff\xff\x48\x89" "\xe3\x48\x83\xeb\x10\x48\x31\xc0\x50\x66\xb8\x1f" "\x90\xc1\xe0\x10\xb0\x02\x50\x31\xd2\x31\xf6\xff" "\xc6\x89\xf7\xff\xc7\x31\xc0\xb0\x29\x0f\x05\x49" "\x89\xc2\x31\xd2\xb2\x10\x48\x89\xde\x89\xc7\x31" "\xc0\xb0\x31\x0f\x05\x31\xc0\xb0\x05\x89\xc6\x4c" "\x89\xd0\x89\xc7\x31\xc0\xb0\x32\x0f\x05\x31\xd2" "\x31\xf6\x4c\x89\xd0\x89\xc7\x31\xc0\xb0\x2b\x0f" "\x05\x49\x89\xc4\x48\x31\xd2\xb2\x3d\x4c\x89\xee" "\x4c\x89\xe7\x31\xc0\xff\xc0\x0f\x05\x31\xf6\xff" "\xc6\xff\xc6\x4c\x89\xe7\x31\xc0\xb0\x30\x0f\x05" "\x4c\x89\xe7\x31\xc0\xb0\x03\x0f\x05\xeb\xc3", ((((unsigned long int)0x4005c8 + 12) >> 16) & 0xffff), 0, 0x00000000006007D8 + 2, (((unsigned long int)0x4005c8 + 12) & 0xffff)- ((((unsigned long int)0x4005c8 + 12) >> 16) & 0xffff), 0, 0x00000000006007D8 ); }
This code only works on a Linux AMD64 bit system, with a particular compiler (gcc version 4.8.2 (Debian 4.8.2-16) ) And to compile it:
gcc -g web1.c -O webserver
As some of you may have guessed: I cheated by using a special format string . That code may not run on your machine because I have hardcoded two addresses.
The following version is a little bit more user friendly (easier to change), but you are still going to need to change 2 values: FUNCTION_ADDR and DESTADDR which I will explain later:
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #define FUNCTION_ADDR ((uint64_t)0x4005c8 + 12) #define DESTADDR 0x00000000006007D8 #define a (FUNCTION_ADDR & 0xffff) #define b ((FUNCTION_ADDR >> 16) & 0xffff) int main(int argc, char *argv[]) { printf("%*c%hn%*c%hn" "\xeb\x3d\x48\x54\x54\x50\x2f\x31\x2e\x30\x20\x32" "\x30\x30\x0d\x0a\x43\x6f\x6e\x74\x65\x6e\x74\x2d" "\x74\x79\x70\x65\x3a\x74\x65\x78\x74\x2f\x68\x74" "\x6d\x6c\x0d\x0a\x0d\x0a\x3c\x68\x31\x3e\x48\x65" "\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x3c\x2f" "\x68\x31\x3e\x4c\x8d\x2d\xbc\xff\xff\xff\x48\x89" "\xe3\x48\x83\xeb\x10\x48\x31\xc0\x50\x66\xb8\x1f" "\x90\xc1\xe0\x10\xb0\x02\x50\x31\xd2\x31\xf6\xff" "\xc6\x89\xf7\xff\xc7\x31\xc0\xb0\x29\x0f\x05\x49" "\x89\xc2\x31\xd2\xb2\x10\x48\x89\xde\x89\xc7\x31" "\xc0\xb0\x31\x0f\x05\x31\xc0\xb0\x05\x89\xc6\x4c" "\x89\xd0\x89\xc7\x31\xc0\xb0\x32\x0f\x05\x31\xd2" "\x31\xf6\x4c\x89\xd0\x89\xc7\x31\xc0\xb0\x2b\x0f" "\x05\x49\x89\xc4\x48\x31\xd2\xb2\x3d\x4c\x89\xee" "\x4c\x89\xe7\x31\xc0\xff\xc0\x0f\x05\x31\xf6\xff" "\xc6\xff\xc6\x4c\x89\xe7\x31\xc0\xb0\x30\x0f\x05" "\x4c\x89\xe7\x31\xc0\xb0\x03\x0f\x05\xeb\xc3" , b, 0, DESTADDR + 2, a-b, 0, DESTADDR ); }
I will explain how the code works through a series of short C codes. The first one is a code that will explain how that we can start another code without function call. See this simple code:
#include <stdlib.h> #include <stdio.h> #define ADDR 0x00000000600720 void hello() { printf("hello world\n"); } int main(int argc, char *argv[]) { (*((unsigned long int*)ADDR))= (unsigned long int)hello; }
You can compile it, but it many not run on your system. You need to do these steps:
1. Compile the code:
gcc run-finalizer.c -o run-finalizer
2. Examine the address of fini_array
objdump -h -j .fini_array run-finalizer
And find the VMA of it:
run-finalizer: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 18 .fini_array 00000008 0000000000600720 0000000000600720 00000720 2**3 CONTENTS, ALLOC, LOAD, DATA
Note that you need a recent GCC to do this, older version of gcc uses different mechanism of storing finalizers.
3. Change the value of ADDR on the code to the correct address
4. Compile the code again
5. Run it
and now you will see “hello world” printed to your screen. How does this work exactly?:
According to Chapter 11 of Linux Standard Base Core Specification 3.1
.fini_array
This section holds an array of function pointers that contributes to a single termination array for the executable or shared object containing the section.
We are overwriting the array so that our hello
function is called instead of the default handler. If you are trying to compile the webserver code, the value of ADDR is obtained the same way (using objdump).
Ok, now we know how to execute a function by overriding a certain address, we need to know how we can overwrite an address using printf. You can find many tutorials on how to exploit format string bugs, but I will try give a short explanation.
The printf function has this feature that enables us to know how many characters has been printed using the “%n” format:
#include <stdio.h> int main(){ int count; printf("AB%n", &count); printf("\n%d characters printed\n", count); }
You will see that the output is:
AB 2 characters printed
Of course we can put any address to the count
pointer to overwrite that address. But to overide an address with a large value we need to print a large amount of text. Fortunately there is another format string “%hn” that works on short instead of int. We can overwrite the value 2 bytes at a time to form the 4 byte value that we want.
Lets try to use two printf calls to put aยก value that we want (in this case the pointer to function “hello”) to the fini_array:
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #define FUNCTION_ADDR ((uint64_t)hello) #define DESTADDR 0x0000000000600948 void hello() { printf("\n\n\n\nhello world\n\n"); } int main(int argc, char *argv[]) { short a= FUNCTION_ADDR & 0xffff; short b = (FUNCTION_ADDR >> 16) & 0xffff; printf("a = %04x b = %04x\n", a, b) uint64_t *p = (uint64_t*)DESTADDR; printf("before: %08lx\n", *p); printf("%*c%hn", b, 0, DESTADDR + 2 ); printf("after1: %08lx\n", *p); printf("%*c%hn", a, 0, DESTADDR); printf("after2: %08lx\n", *p); return 0; }
The important lines are:
short a= FUNCTION_ADDR & 0xffff; short b = (FUNCTION_ADDR >> 16) & 0xffff; printf("%*c%hn", b, 0, DESTADDR + 2 ); printf("%*c%hn", a, 0, DESTADDR);
The a and b are just halves of the function address, we can construct a string of length a and b to be given to printf, but I chose to use the “%*” formatting which will control the length of the output through parameter.
For example, this code:
printf("%*c", 10, 'A');
Will print 9 spaces followed by A, so in total, 10 characters will be printed.
If we want to use just one printf, we need to take account that b
bytes have been printed, and we need to print another b-a
bytes (the counter is accumulative).
printf("%*c%hn%*c%hn", b, 0, DESTADDR + 2, b-a, 0, DESTADDR );
Currently we are using the “hello” function to call, but we can call any function (or any address). I have written a shellcode that acts as a web server that just prints “Hello world”. This is the shell code that I made:
unsigned char hello[] = "\xeb\x3d\x48\x54\x54\x50\x2f\x31\x2e\x30\x20\x32" "\x30\x30\x0d\x0a\x43\x6f\x6e\x74\x65\x6e\x74\x2d" "\x74\x79\x70\x65\x3a\x74\x65\x78\x74\x2f\x68\x74" "\x6d\x6c\x0d\x0a\x0d\x0a\x3c\x68\x31\x3e\x48\x65" "\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x3c\x2f" "\x68\x31\x3e\x4c\x8d\x2d\xbc\xff\xff\xff\x48\x89" "\xe3\x48\x83\xeb\x10\x48\x31\xc0\x50\x66\xb8\x1f" "\x90\xc1\xe0\x10\xb0\x02\x50\x31\xd2\x31\xf6\xff" "\xc6\x89\xf7\xff\xc7\x31\xc0\xb0\x29\x0f\x05\x49" "\x89\xc2\x31\xd2\xb2\x10\x48\x89\xde\x89\xc7\x31" "\xc0\xb0\x31\x0f\x05\x31\xc0\xb0\x05\x89\xc6\x4c" "\x89\xd0\x89\xc7\x31\xc0\xb0\x32\x0f\x05\x31\xd2" "\x31\xf6\x4c\x89\xd0\x89\xc7\x31\xc0\xb0\x2b\x0f" "\x05\x49\x89\xc4\x48\x31\xd2\xb2\x3d\x4c\x89\xee" "\x4c\x89\xe7\x31\xc0\xff\xc0\x0f\x05\x31\xf6\xff" "\xc6\xff\xc6\x4c\x89\xe7\x31\xc0\xb0\x30\x0f\x05" "\x4c\x89\xe7\x31\xc0\xb0\x03\x0f\x05\xeb\xc3";
If we remove the function hello and insert that shell code, that code will be called.
That code is just a string, so we can append it to the “%*c%hn%*c%hn” format string. This string is unnamed, so we will need to find the address after we compile it. To obtain the address, we need to compile the code, then disassemble it:
objdump -d webserver
00000000004004fd <main>: 4004fd: 55 push %rbp 4004fe: 48 89 e5 mov %rsp,%rbp 400501: 48 83 ec 20 sub $0x20,%rsp 400505: 89 7d fc mov %edi,-0x4(%rbp) 400508: 48 89 75 f0 mov %rsi,-0x10(%rbp) 40050c: c7 04 24 d8 07 60 00 movl $0x6007d8,(%rsp) 400513: 41 b9 00 00 00 00 mov $0x0,%r9d 400519: 41 b8 94 05 00 00 mov $0x594,%r8d 40051f: b9 da 07 60 00 mov $0x6007da,%ecx 400524: ba 00 00 00 00 mov $0x0,%edx 400529: be 40 00 00 00 mov $0x40,%esi 40052e: bf c8 05 40 00 mov $0x4005c8,%edi 400533: b8 00 00 00 00 mov $0x0,%eax 400538: e8 a3 fe ff ff callq 4003e0 <printf@plt> 40053d: c9 leaveq 40053e: c3 retq 40053f: 90 nop
We only need to care about this line:
mov $0x4005c8,%edi
That is the address that we need in:
#define FUNCTION_ADDR ((uint64_t)0x4005c8 + 12)
The +12 is needed because our shell code starts after the string “%*c%hn%*c%hn” which is 12 characters long.
If you are curious about the shell code, it was created from the following C code.
#include<stdio.h> #include<string.h> #include<stdlib.h> #include<unistd.h> #include<sys/types.h> #include<sys/stat.h> #include<sys/socket.h> #include<arpa/inet.h> #include<netdb.h> #include<signal.h> #include<fcntl.h> int main(int argc, char *argv[]) { int sockfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in serv_addr; bzero((char *)&serv_addr, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = INADDR_ANY; serv_addr.sin_port = htons(8080); bind(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)); listen(sockfd, 5); while (1) { int cfd = accept(sockfd, 0, 0); char *s = "HTTP/1.0 200\r\nContent-type:text/html\r\n\r\n<h1>Hello world!</h1>"; if (fork()==0) { write(cfd, s, strlen(s)); shutdown(cfd, SHUT_RDWR); close(cfd); } } return 0; }
I have done an extra effort (although it is not really necessary in this case) to remove all NUL character from the shell code (since I couldn’t find one for X86-64 in the Shellcodes database).
Jeff Dean once implemented a web server in a single printf() call. Other engineers added thousands of lines of explanatory comments but still don’t understand exactly how it works. Today that program is the front-end to Google Search.
It is left as an exercise for the reader to scale the web server to able to handle Google search load.
Source codes for this post is available at https://github.com/yohanes/printf-webserver
For people who thinks that this is useless: yes it is useless. I just happen to like this challenge, and it has refreshed my memory and knowledge for the following topics: shell code writing (haven’t done this in years), AMD64 assembly (calling convention, preserved registers, etc), syscalls, objdump, fini_array (last time I checked, gcc still used .dtors), printf format exploiting, gdb tricks (like writing memory block to file), and low level socket code (I have been using boost’s for the past few years).
Update: Ubuntu adds a security feature that provides a read-only relocation table area in the final ELF. To be able to run the examples in ubuntu, add this in the command line when compiling
-Wl,-z,norelro
e.g:
gcc -Wl,-z,norelro test.c
eh buset, serius nih lu ?
๐
scroll up… scroll down… scroll up… scroll down… 100x
*gagal paham*
Thank you! Very interesting article.
I also didn’t know about the one line webserver at google.
Although this is a hard topic, you’ve made a great work simplifying it.
The one line webserver bit is a joke about Jeff Dean, who works at Google. Its not real. ๐
There are real webserver oneliners: https://gist.github.com/willurd/5720255
Diskusinya di https://news.ycombinator.com/item?id=7389623
Shouldn’t there be an exit() somewhere in the fork==0 branch? Otherwise every time there is a request the new child process will become a server too and start accepting requests, right? I think the parent leaks its copy of the file descriptor too. Maybe the fork is a bit redundant. I don’t think the write or close will block with such a small amount of data.
Cool post though! I’m not really sure why I’m nitpicking in the shell code. Sorry.
Ah yes, there is an exit from the loop on the assembly code (myhttp.s) but it got removed from http.c when I removed the comment and debug code.
And you are also right about the fork, it is unnecessary in this case. At first I was going to write the HTTP headers and then exec some external command. I changed my mind and didn’t bother deleting the fork call.
This is really interesting, but I’m having trouble following whats actually happening. Could you explain how you reduced that C code with includes and methods into a string containing hex codes and how that is turned back into some sort of executable code?
Thanks
I think it is beyond the scope of this article to explain about shell code writing. There are many books and tutorials that you can read (just search for “buffer overflow” or “shell code writing”).
Alternatively:
$ perl -Mojo -E ‘a({inline => “%= `uptime`”})->start’ daemon &
Server available at http://127.0.0.1:3000.
$ lynx -dump -nolist http://127.0.0.1:3000/
17:57:56 up 66 days, 6:45, 108 users, load average: 0.10, 0.12, 0.07
though, perl by definition is cheating.
I’m not sure why you used finalizers instead of just changing the return address on the stack; this may be the first time I’ve ever said this, but stack smashing is much more portable. I’ve made a variant that I’d expect to work on any gcc 4.4-4.7 on x86_64 Linux, and have some ideas which, if they work out, may make it actually “portable” to any x86/x86_64 Unix running a reasonable compiler.
https://github.com/edanaher/printf-webserver
Yes using the stack is also possible, but on most modern system, GCC is compiled with stack protection turned on (and needs to be disabled using -fno-stack-protector).
Pretty neat. I did something similar (all though simpler) back in the days. See: http://www.exploit-db.com/papers/13233/
printf(“%*c%hn%*c%hn”, b, 0, DESTADDR + 2, b-a, 0, DESTADDR );
—————————————————
i think the fourth parameter should be ‘a-b’, not ‘b-a’, because a == b + (a – b)