I have a golf server and it executes code posted by random users in a sandbox. This post explains how the sandbox is implemented. Note my golf server has a unique requirement to the sandbox. It's not mainly for security, but for disallowing cheat. I think this makes my sandbox slightly different from other implementations.
My golf server consists of the frontend and the backend. The frontend server is a normal Ruby program. This speaks HTTP and sends the submitted code to the execution server in the backend. The backend is running the execution server. This is a TCP server written in Ruby. This talks with the frontend, sets up the sandbox, and runs the submitted code.
VM
I don't mind too much even if a user manages to break my server. It doesn't contain very important information. However, I don't want an attacker to attack another server using my server, even when an attacker manages to be the root user of the backend.
So, the execution server is running in a VM (Xen), and the VM isn't connected to the external network. It can only connect to the host, which runs the web frontend.
System calls
I implemented a simple kernel module to hook system calls which I want to audit. Its source code is
https://github.com/shinh/ags/blob/master/be/modules/sandbox.c
To modify the system call table, you need to know the address of the syscall table. This address is just hardcoded:
#define SYS_CALL_TABLE ((void**)0xc1295348)
You can obtain this value from /boot/System.map-`uname -r`.
Several syscalls, such as setsid, setuid, and sendto, just return -EPERM.
I think the requirement of golf servers is somewhat unique. Unlike normal sandbox implementations for security, sometimes we need to allow execve. Some language implementations, such as Scala and R, use execve internally, even for a simple "hello world" program. However, we don't want users to call execve arbitrary times. It's not fun if the shortest C program always start with
main(){system("perl ...");}
So, my sandbox allows users to call execve, but it counts the number of execve syscalls called. If this number is larger than the number required to run a "hello world" program, the frontend rejects the solution even if the output is correct.
The execution server needs to interact with the kernel module to initialize the sandbox and obtain some numbers (e.g., the number of execve called). I think normal kernel modules would use procfs or something. However, as I know almost nothing about kernel and don't want to depend on kernel's internal structures, my sandbox provides the interface by hooked syscalls.
My sandbox abuses getpriority and setpriority for this purpose. I chose them just because their interfaces are convenient for my purpose.
int getpriority(int which, id_t who); int setpriority(int which, id_t who, int prio);
If the parameter "which" is a magic number (1764), they get/set internal values in the sandbox. For example, you can get the number of execve called by
getpriority(1764, __NR_execve)
It also provides a weird feature. If you call setpriority(1764, __NR_getpid, pid), you can set the next PID to the specified number. Sometimes golfers want to assume the PID is a specific number. For example, the shortest Ruby program which outputs 9999 is "p$$". Note $$ is PID in Ruby. Without this interface, users need to DOS attack my server to get PID they want. There is also a web frontend for this weird feature.
Alternatives
- ptrace
- At first, my sandbox was using this. However, this is slow.
- seccomp-bpf
- When I implemented my sandbox, I didn't know this. Even if I knew this, I think I couldn't use this because my kernel didn't support it. If I write a sandbox for a golf server again, I'd use this, I think.
- SELinux
- I tried to study this, but it was too difficult to me :( It seemed SELinux does not satisfy the unique requirement (count the number of syscall invocations instead of just letting them fail), but I'm not sure.
- Modify language implementations
- codegolf.com was doing this. But this approach is impractical to support many languages. As of writing, my server supports 102 languages.
Files
If a file can persist, users can easily cheat. They submit a code which writes a solution to a file, and then submit another code which just outputs the content of it. This makes problems like Text Compression very boring. Everyone submits code like "cat /tmp/a".
So, all files must be removed after the execution. To achieve this, I removed all writable locations except /dev/shm. As some languages require /tmp, /var/tmp and $HOME, they are prepared as symlinks to /dev/shm. /dev/shm is a tmpfs and re-mounted after an execution.
Misc.
My server is not robust against DOS attacks. However, I don't want innocent users to accidentally stop my server often. To mitigate the risk, the execution server setrlimit NPROC and RSS. The number of fork calls is also limited by the kernel module.
If you can run a process which persists, the process can hold a solution and pass it to a subsequent process. To prevent this, the execution server kills all user processes except itself after it handles a submission.
Once a smart person has managed to cheat at my server by abusing IPC. So, my server limits the size of buffers for IPCs by setting values to procfs.
Summary
I don't think my server is not uncheatable, but this sandbox seems to be enough to prevent users from cheating too easily.