hoge

Tuesday, October 28, 2014

Sandbox of my golf server

I have a golf server and it executes code posted by random users in a sandbox. This post explains how the sandbox is implemented. Note my golf server has a unique requirement to the sandbox. It's not mainly for security, but for disallowing cheat. I think this makes my sandbox slightly different from other implementations.

My golf server consists of the frontend and the backend. The frontend server is a normal Ruby program. This speaks HTTP and sends the submitted code to the execution server in the backend. The backend is running the execution server. This is a TCP server written in Ruby. This talks with the frontend, sets up the sandbox, and runs the submitted code.

VM

I don't mind too much even if a user manages to break my server. It doesn't contain very important information. However, I don't want an attacker to attack another server using my server, even when an attacker manages to be the root user of the backend.

So, the execution server is running in a VM (Xen), and the VM isn't connected to the external network. It can only connect to the host, which runs the web frontend.

System calls

I implemented a simple kernel module to hook system calls which I want to audit. Its source code is

https://github.com/shinh/ags/blob/master/be/modules/sandbox.c

To modify the system call table, you need to know the address of the syscall table. This address is just hardcoded:

#define SYS_CALL_TABLE ((void**)0xc1295348)

You can obtain this value from /boot/System.map-`uname -r`.

Several syscalls, such as setsid, setuid, and sendto, just return -EPERM.

I think the requirement of golf servers is somewhat unique. Unlike normal sandbox implementations for security, sometimes we need to allow execve. Some language implementations, such as Scala and R, use execve internally, even for a simple "hello world" program. However, we don't want users to call execve arbitrary times. It's not fun if the shortest C program always start with

main(){system("perl ...");}

So, my sandbox allows users to call execve, but it counts the number of execve syscalls called. If this number is larger than the number required to run a "hello world" program, the frontend rejects the solution even if the output is correct.

The execution server needs to interact with the kernel module to initialize the sandbox and obtain some numbers (e.g., the number of execve called). I think normal kernel modules would use procfs or something. However, as I know almost nothing about kernel and don't want to depend on kernel's internal structures, my sandbox provides the interface by hooked syscalls.

My sandbox abuses getpriority and setpriority for this purpose. I chose them just because their interfaces are convenient for my purpose.

int getpriority(int which, id_t who);
int setpriority(int which, id_t who, int prio);

If the parameter "which" is a magic number (1764), they get/set internal values in the sandbox. For example, you can get the number of execve called by

getpriority(1764, __NR_execve)

It also provides a weird feature. If you call setpriority(1764, __NR_getpid, pid), you can set the next PID to the specified number. Sometimes golfers want to assume the PID is a specific number. For example, the shortest Ruby program which outputs 9999 is "p$$". Note $$ is PID in Ruby. Without this interface, users need to DOS attack my server to get PID they want. There is also a web frontend for this weird feature.

Alternatives

ptrace: At first, my sandbox was using this. However, this is slow.
seccomp-bpf: When I implemented my sandbox, I didn't know this. Even if I knew this, I think I couldn't use this because my kernel didn't support it. If I write a sandbox for a golf server again, I'd use this, I think.
SELinux: I tried to study this, but it was too difficult to me :( It seemed SELinux does not satisfy the unique requirement (count the number of syscall invocations instead of just letting them fail), but I'm not sure.
Modify language implementations: codegolf.com was doing this. But this approach is impractical to support many languages. As of writing, my server supports 102 languages.

Files

If a file can persist, users can easily cheat. They submit a code which writes a solution to a file, and then submit another code which just outputs the content of it. This makes problems like Text Compression very boring. Everyone submits code like "cat /tmp/a".

So, all files must be removed after the execution. To achieve this, I removed all writable locations except /dev/shm. As some languages require /tmp, /var/tmp and $HOME, they are prepared as symlinks to /dev/shm. /dev/shm is a tmpfs and re-mounted after an execution.

Misc.

My server is not robust against DOS attacks. However, I don't want innocent users to accidentally stop my server often. To mitigate the risk, the execution server setrlimit NPROC and RSS. The number of fork calls is also limited by the kernel module.

If you can run a process which persists, the process can hold a solution and pass it to a subsequent process. To prevent this, the execution server kills all user processes except itself after it handles a submission.

Once a smart person has managed to cheat at my server by abusing IPC. So, my server limits the size of buffers for IPCs by setting values to procfs.

Summary

I don't think my server is not uncheatable, but this sandbox seems to be enough to prevent users from cheating too easily.

Update: crontab and syslog

int-e found you can use crontab or syslog to store arbitrary data and actually demonstrated the exploit of crontab. I added /etc/cron.deny to disallow users to use crontab, and disallowed syslog syscall to prevent the use of dmesg.

Tuesday, April 2, 2013

104B Hello, world! ELF binary for x86-64 linux

I created a tiny (104 bytes) Hello, world! ELF binary for x86-64 linux, as I've seen this page. http://shinh.skr.jp/obf/hello_linux_elf_x64.out This is an assembly code in NASM.

BITS 64
        org     0x01000000
hello:
        db      0x7F, "ELF"     ; e_ident
        db      "o, world!", 10
        db      0, 0
        dw      2               ; e_type
        dw      62              ; e_machine
        dd      1               ; e_version
        dq      _start          ; e_entry
        dq      phdr - $$       ; e_phoff
                                ; e_shoff
cont2:
        mov     AL, 4           ; write = 4
        int     0x80
        xchg    EAX, EDI        ; exit(0)
        xchg    EAX, EBX        ; exit = 1
        int     0x80
phdr:
        dd      1               ; e_flags & p_type
        dw      7               ; e_ehsize & p_flags
        dw      56              ; e_phentsize & p_flags
        dw      1               ; e_phnum & p_offset
        dw      0               ; e_shentsize & p_offset
        dw      0               ; e_shnum & p_offset
        dw      0               ; e_shstrndx & p_offset
        dq      $$ + 1          ; p_vaddr
                                ; p_paddr
_start:
        inc     EBX             ; stdout = 1
        mov     DL, 14          ; strlen = 14
        inc     ECX
        jmp     cont
        dq      filesize - 1    ; p_filesz
                                ; p_memsz
cont:
        shl     ECX, 24
        db      0x25            ; and EAX, 0 (fall through)
        db      0, 0, 0, 0
                                ; p_align
        xor     dword[RCX], 0x2a202037
        jmp     cont2
filesize equ    $ - $$

I also created a spreadsheet to explain this. If this is interesting to you, you may want to check my collection as well. My x86-64 code is much bigger than 58B hello because both ELF header and program header on x86-64 are much bigger than on x86-32. I couldn't find a better way to overlap ELF header and program header, and my code has all code and data in these headers. So, I'm assuming 104B is optimal. Although this work should be easier than binary golf for x86, there were a few challenges:

1 byte inc/dec has gone.
As mmap for small addresses isn't allowed on recent linux (see /proc/sys/vm/mmap_min_addr), I used addresses bigger than 16bit. In fact, 58 bytes hello for x86-32 won't work due to this reason on recent Linux distributions. I needed to use inc&shl to set the address of "Hello, world!\n" to ECX.
As we cannot access 0x0000-0x1000, most data cannot be executed. For example, 0x0000 is add [EAX], EAX.

Thursday, December 6, 2012

ShaFuck is not unbeatable

When I found ShaFuck, I really loved this idea and I thought it's indeed impossible to write code in this language.

However, I found a way to write code in ShaFuck and I could write a script which translates Brainfuck code into ShaFuck code.

http://shinh.skr.jp/obf/bf2sf.rb

Here is a few evidence:

> ./shafuck hello.sf
Hello, world!
> ./shafuck cal.sf
2012 03
                   1
 2  3  4  5  6  7  8
 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

The idea was very simple. We cannot find a 1024 byte characters which will be translated into a fully meaningful 20 BF commands (1/32^20 if SHA1 is random), but we can find a code chunk which will become 3 bytes of BF sequence and 17 bytes of junks. As the existence of BF comments (i.e., any characters which aren't the eight BF commands) are allowed as long as they aren't executed, we need to skip the junks. Luckily, I could find a translation:

+ => +>[ (17B)  (18B) ]<
- => ->[ (17B)  (18B) ]<
> => >[  (18B)  (18B) ]>
< => <[  (18B)  (18B) ]<
. => .>[ (17B)  (18B) ]<
, => ,>[ (17B)  (18B) ]<
[ => >[  (18B)  (17B) ]<[
] => >[  (18B)  (17B) ]<]

With my translation, all BF commands will be translated into 2 ShaFuck code junks (i.e., 2048 bytes of code). The trick was that we will never modify the contents of odd address and they always have zero. We can use them to skip junks. Note that > and < moves the pointers twice.

I found code chunks which will be translated into the above sequences by

http://shinh.skr.jp/obf/sf_find_ops.cc

I used sha1.c and sha1.h in shafuck's reference implementation. It took about 3 minutes to find code chunks we need.

I used shafuck-0.2 for this entry. The feature versions may not have this vulnerability. I think the simplest "fix" for this issue is validating code after calculating SHA1 and before running code.

Wednesday, July 18, 2012

ICFP Programming Contest 2012

The problem was to write a program which automatically solves Boulder Dash efficiently (the less turn you consume, the better).

As usual, I joined the lightning division (the first 24 hours of this 72 hours long contest) with C++. However, I decided to do something different this year because I was getting a bit bored of this kind of topcoder marathon match style problems. So, I wrote the boulder dash solver in Befunge:

http://shinh.skr.jp/dat_dir/bd.b98

I guess this is one of the biggest hand-written Befunge code. Technically, this isn't Befunge-93 code because it uses address space bigger than 80x25, so this code requires Funge-98 implementation like cfunge. However, I think it would be OK to claim this is a Befunge code because I only used Befunge-93 operations. Anyway, it wasn't easy to write this code, although I think I'm fairly good at Befunge (I've ever won a Befunge contest at codeforces). Like other participants, I didn't do almost nothing except for writing this code, but it was just 6 hours before the end of contest when I finished my simulator in Befunge...

The following is the full package I submitted. This includes a Befunge interpreter implementation written by me during this contest, some Makefile and README stuff, code for lightning division, etc.

http://shinh.skr.jp/dat_dir/icfp12.tgz

This is an attempt to describe something about my code. You can see a visualized image of dungeon around the bottom right of this image. This is just a memory dump. Befunge has 2D address space so we need no external visualizer.

BTW, I forgot to write something about ICFP programming contest 2011. It was great. The problem was better than all other contest problems I've ever seen and the organizers prepared a fairly stable duel server where participants can fight each other. I took 6th place. It seemed the 5th place team was also 1 person team and he got the judge's prize because of he was a one person team. So, I was the 2nd place one person team, yay.

Sunday, May 13, 2012

The 20th IOCCC

I won the 20th IOCCC! http://ioccc.org/2011/whowon.html My code was a paint by number solver which looks like a paint by number problem. I think my code is decent. It is fairly obfuscated due to the size limit, bit operations, and the difficulty of paint by numbers itself. But obviously, akari.c is much better than mine...

Saturday, September 12, 2009

ICFP Programming Contest 2009

Surprisingly, I won the first place in the ICFP programming contest 2009. I'm very appreciated for this result - it was much better than my expectation. I was in 9th place before the validation process.

This is the post to describe my approach of my solution. Here is my source code:

http://shinh.skr.jp/dat_dir/icfpc.zip

* FAQs

Before the detailed descriptions, I put some questions asked by other contestants.

Q. Did you go to the moon?
A. No, I couldn't. I considered if I could go to the moon a bit, but I decided not to go the moon. As it takes a lot of fuel and the score of far debris aren't big, I prioritized improvement of stability and faster arrivals at near debris.

Q. Did you solve traveling salesman problem?
A. No. I chose the next target almost greedy. Please see the following details as well.

Q. Did you solve difficult mathematic formulas?
A. No. Though I tried to find an analytic solutions, the effort was just waste of time. Basically, I only used Hohmann transfer. Even if we had a analytic solutions, we would need some adjustment for errors due to discrete space anyway.

Q. 1 person team?
A. Yes. I always attend this contest because I'd like to do everything by myself.

Q. How long did you sleep?
A. Not sure, but maybe ~20 hours. I cannot write valid code with few sleep. It is the lesson learned from previous contests.

Q. Why C++?
A. I love C++. By the way, I think I'm not offensive to functional programming languages. Actually, I tried some of them a bit (maybe I wrote from ~1k to ~3k lines of toy code for each of them). I thought OCaml is good but it's standard library is poor. Perhaps I should have tried ExtLib or something like this. Haskell was slow when I tried. I read some of assemblies GHC produced, but it was very difficult for me to understand it... I tried Scheme interpreter and it was interpreter after all. Maybe I'll want to try compilers (Stalin?). Commercial common LISP processors sound great, but I only have opensource implementations. So, I'm thinking C++ is the most practical languages to me for now. I know I'm biased - I spent much more time for C++ than other languages. I'm looking forward continuing functional programming funs' great work.

Q. Why not other imperative languages?
A. I feel Java is less free than C++. As for C#, my experience with C# is too small to discuss it. I'm (or was?) a fun of D and actually I used D for this contest three times (2004, 2006, and 2007), but I don't use it these days. Maybe I'll want to use it again for the next year's contest.

* Detail

This post is based on the script of my talk for vidiowiki: http://vidiowiki.com/watch/m844dyn/ (Hmm... it's embarrassing to watch I'm talking with my poor English. But it is good experience to me.)

Overall, I think my approach wasn't so special. Just like many teams might do, I wrote a VM, a visualizer, a physic simulator, and hoahman transfer.

My VM was fast because it translates the input binary into C code. Otherwise it's normal.

The visualizer is also a normal stuff. Nothing to say about it. You can see a YouTube video at (sorry for its poor quality) http://www.youtube.com/watch?v=IUGiiFsnLLs

I also had a physic simulator. It calculates the states of future quickly. Due to two reasons, it was a bit tricky to implement the simulator correctly. The first reason is that the binary organizers provided was wrong. The gravity from the moon didn't affect to our spaceship. Also, the gravity from the moon is calculated by g(t+1) = G*M/(p(t+1)-pm(t))^2 where p is the position of a body and pm is the position of the moon. I think pm(t) should have been pm(t+1). Anyway, with some reverse engineering works, I could implement the simulator correctly. It helped me a lot by its fast simulation.

With these three tools, I implemented my strategy, which was basically, kind of bruteforce.

First my simulator calculates the future positions of a target for 5000 seconds. The track should be an elliptic arc around the earth. And then, it checks if I have a chance to reach a point which is close enough to the arc with ideal Hoahman transfer.

As there are the gravity from the Moon and the program is running in discrete space, there should be some errors. Things are not ideal. Therefore, my program starts adjustment as the next step when it finds it can go close to the track of the target. It runs simulations for various initial velocities again and again. This process takes a lot of time. My fast simulator helped this process.

The last question is how to decide the next target from 10 targets. My approach was a greedy algorithm with a heuristics. If it finds a promissing plan for a target, it starts the journey except for two cases. One case is that it withdraws the plan if the plan takes too much fuel. Another case is that the target is too far. The latter exception is a performance optimization. It is considered before it starts the first step to reduce unnecessary calculations for too far targets (e.g., the debri around the moon isn't good candidate as the first target). I know this is not optimal, but it seemed it was acceptable enough.

So, as I described, I think my approach isn't the smartest way. but I guess the good point of my program was its robustness. I've heard that there are some teams whose program unfortunately crashed with some typical test cases and some teams' solution only work for the given test cases.

The performance might be also the key of my success. It seemed that there are some teams whose solutions are similar to mine. However, most of them don't have fast simulator and they just use VM to simulate the physics. Also, I think C++ helps me to write solid and fast code.

Thanks the organizers for the excellent contest!

Sunday, August 2, 2009

Symbolic Polyglot Quine

Recently, I wrote the following code:

http://shinh.skr.jp/obf/sym_poly_quine.txt

This script is a Quine, so it outputs the program itself into stdout without file input. You can run this script by Ruby, Perl, and JavaScript. This script is a polyglot of these three languages. And, this script was written only with symbolic characters (!"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~). There should be no alphabets, numbers, whitespaces, and 8bit characters.

Perl is OK only with symbolic characters. You can convert any Perl scripts into symbolic style scripts using Acme::EyeDrops. This module is awesome.

http://search.cpan.org/dist/Acme-EyeDrops/

Also, all JavaScript scripts can be converted into symbolic style. jjencode does the magic.

http://utf-8.jp/public/jjencode.html

As for Ruby, Ruby 1.9 is perfect for symbolic programming. kurimura found the way to convert arbitrary Ruby scripts into symbolic format.

http://d.hatena.ne.jp/kurimura/20080824

Unfortunately, I think Ruby 1.8's symbolic programming is very limited. For example, you cannot create a loop. As quine only require substitutions and outputs for stdout, which is doable with symbolic Ruby 1.8, my script can run with Ruby 1.8.

There are some other weird code like this:

http://shinh.skr.jp/obf/