Tuesday, April 2, 2013

104B Hello, world! ELF binary for x86-64 linux

I created a tiny (104 bytes) Hello, world! ELF binary for x86-64 linux, as I've seen this page. http://shinh.skr.jp/obf/hello_linux_elf_x64.out This is an assembly code in NASM.
        org     0x01000000
        db      0x7F, "ELF"     ; e_ident
        db      "o, world!", 10
        db      0, 0
        dw      2               ; e_type
        dw      62              ; e_machine
        dd      1               ; e_version
        dq      _start          ; e_entry
        dq      phdr - $$       ; e_phoff
                                ; e_shoff
        mov     AL, 4           ; write = 4
        int     0x80
        xchg    EAX, EDI        ; exit(0)
        xchg    EAX, EBX        ; exit = 1
        int     0x80
        dd      1               ; e_flags & p_type
        dw      7               ; e_ehsize & p_flags
        dw      56              ; e_phentsize & p_flags
        dw      1               ; e_phnum & p_offset
        dw      0               ; e_shentsize & p_offset
        dw      0               ; e_shnum & p_offset
        dw      0               ; e_shstrndx & p_offset
        dq      $$ + 1          ; p_vaddr
                                ; p_paddr
        inc     EBX             ; stdout = 1
        mov     DL, 14          ; strlen = 14
        inc     ECX
        jmp     cont
        dq      filesize - 1    ; p_filesz
                                ; p_memsz
        shl     ECX, 24
        db      0x25            ; and EAX, 0 (fall through)
        db      0, 0, 0, 0
                                ; p_align
        xor     dword[RCX], 0x2a202037
        jmp     cont2
filesize equ    $ - $$
I also created a spreadsheet to explain this. If this is interesting to you, you may want to check my collection as well. My x86-64 code is much bigger than 58B hello because both ELF header and program header on x86-64 are much bigger than on x86-32. I couldn't find a better way to overlap ELF header and program header, and my code has all code and data in these headers. So, I'm assuming 104B is optimal. Although this work should be easier than binary golf for x86, there were a few challenges:
  • 1 byte inc/dec has gone.
  • As mmap for small addresses isn't allowed on recent linux (see /proc/sys/vm/mmap_min_addr), I used addresses bigger than 16bit. In fact, 58 bytes hello for x86-32 won't work due to this reason on recent Linux distributions. I needed to use inc&shl to set the address of "Hello, world!\n" to ECX.
  • As we cannot access 0x0000-0x1000, most data cannot be executed. For example, 0x0000 is add [EAX], EAX.

Thursday, December 6, 2012

ShaFuck is not unbeatable

When I found ShaFuck, I really loved this idea and I thought it's indeed impossible to write code in this language.

However, I found a way to write code in ShaFuck and I could write a script which translates Brainfuck code into ShaFuck code.


Here is a few evidence:

> ./shafuck hello.sf
Hello, world!
> ./shafuck cal.sf
2012 03
 2  3  4  5  6  7  8
 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 

The idea was very simple. We cannot find a 1024 byte characters which will be translated into a fully meaningful 20 BF commands (1/32^20 if SHA1 is random), but we can find a code chunk which will become 3 bytes of BF sequence and 17 bytes of junks. As the existence of BF comments (i.e., any characters which aren't the eight BF commands) are allowed as long as they aren't executed, we need to skip the junks. Luckily, I could find a translation:

+ => +>[ (17B)  (18B) ]<
- => ->[ (17B)  (18B) ]<
> => >[  (18B)  (18B) ]>
< => <[  (18B)  (18B) ]<
. => .>[ (17B)  (18B) ]<
, => ,>[ (17B)  (18B) ]<
[ => >[  (18B)  (17B) ]<[
] => >[  (18B)  (17B) ]<]

With my translation, all BF commands will be translated into 2 ShaFuck code junks (i.e., 2048 bytes of code). The trick was that we will never modify the contents of odd address and they always have zero. We can use them to skip junks. Note that > and < moves the pointers twice.

I found code chunks which will be translated into the above sequences by


I used sha1.c and sha1.h in shafuck's reference implementation. It took about 3 minutes to find code chunks we need.

I used shafuck-0.2 for this entry. The feature versions may not have this vulnerability. I think the simplest "fix" for this issue is validating code after calculating SHA1 and before running code.

Wednesday, July 18, 2012

ICFP Programming Contest 2012

The problem was to write a program which automatically solves Boulder Dash efficiently (the less turn you consume, the better).

As usual, I joined the lightning division (the first 24 hours of this 72 hours long contest) with C++. However, I decided to do something different this year because I was getting a bit bored of this kind of topcoder marathon match style problems. So, I wrote the boulder dash solver in Befunge:


I guess this is one of the biggest hand-written Befunge code. Technically, this isn't Befunge-93 code because it uses address space bigger than 80x25, so this code requires Funge-98 implementation like cfunge. However, I think it would be OK to claim this is a Befunge code because I only used Befunge-93 operations. Anyway, it wasn't easy to write this code, although I think I'm fairly good at Befunge (I've ever won a Befunge contest at codeforces). Like other participants, I didn't do almost nothing except for writing this code, but it was just 6 hours before the end of contest when I finished my simulator in Befunge...

The following is the full package I submitted. This includes a Befunge interpreter implementation written by me during this contest, some Makefile and README stuff, code for lightning division, etc.


This is an attempt to describe something about my code. You can see a visualized image of dungeon around the bottom right of this image. This is just a memory dump. Befunge has 2D address space so we need no external visualizer.

BTW, I forgot to write something about ICFP programming contest 2011. It was great. The problem was better than all other contest problems I've ever seen and the organizers prepared a fairly stable duel server where participants can fight each other. I took 6th place. It seemed the 5th place team was also 1 person team and he got the judge's prize because of he was a one person team. So, I was the 2nd place one person team, yay.

Sunday, May 13, 2012

The 20th IOCCC

I won the 20th IOCCC! http://ioccc.org/2011/whowon.html My code was a paint by number solver which looks like a paint by number problem. I think my code is decent. It is fairly obfuscated due to the size limit, bit operations, and the difficulty of paint by numbers itself. But obviously, akari.c is much better than mine...

Saturday, September 12, 2009

ICFP Programming Contest 2009

Surprisingly, I won the first place in the ICFP programming contest 2009. I'm very appreciated for this result - it was much better than my expectation. I was in 9th place before the validation process.

This is the post to describe my approach of my solution. Here is my source code:


* FAQs

Before the detailed descriptions, I put some questions asked by other contestants.

Q. Did you go to the moon?
A. No, I couldn't. I considered if I could go to the moon a bit, but I decided not to go the moon. As it takes a lot of fuel and the score of far debris aren't big, I prioritized improvement of stability and faster arrivals at near debris.

Q. Did you solve traveling salesman problem?
A. No. I chose the next target almost greedy. Please see the following details as well.

Q. Did you solve difficult mathematic formulas?
A. No. Though I tried to find an analytic solutions, the effort was just waste of time. Basically, I only used Hohmann transfer. Even if we had a analytic solutions, we would need some adjustment for errors due to discrete space anyway.

Q. 1 person team?
A. Yes. I always attend this contest because I'd like to do everything by myself.

Q. How long did you sleep?
A. Not sure, but maybe ~20 hours. I cannot write valid code with few sleep. It is the lesson learned from previous contests.

Q. Why C++?
A. I love C++. By the way, I think I'm not offensive to functional programming languages. Actually, I tried some of them a bit (maybe I wrote from ~1k to ~3k lines of toy code for each of them). I thought OCaml is good but it's standard library is poor. Perhaps I should have tried ExtLib or something like this. Haskell was slow when I tried. I read some of assemblies GHC produced, but it was very difficult for me to understand it... I tried Scheme interpreter and it was interpreter after all. Maybe I'll want to try compilers (Stalin?). Commercial common LISP processors sound great, but I only have opensource implementations. So, I'm thinking C++ is the most practical languages to me for now. I know I'm biased - I spent much more time for C++ than other languages. I'm looking forward continuing functional programming funs' great work.

Q. Why not other imperative languages?
A. I feel Java is less free than C++. As for C#, my experience with C# is too small to discuss it. I'm (or was?) a fun of D and actually I used D for this contest three times (2004, 2006, and 2007), but I don't use it these days. Maybe I'll want to use it again for the next year's contest.

* Detail

This post is based on the script of my talk for vidiowiki: http://vidiowiki.com/watch/m844dyn/ (Hmm... it's embarrassing to watch I'm talking with my poor English. But it is good experience to me.)

Overall, I think my approach wasn't so special. Just like many teams might do, I wrote a VM, a visualizer, a physic simulator, and hoahman transfer.

My VM was fast because it translates the input binary into C code. Otherwise it's normal.

The visualizer is also a normal stuff. Nothing to say about it. You can see a YouTube video at (sorry for its poor quality) http://www.youtube.com/watch?v=IUGiiFsnLLs

I also had a physic simulator. It calculates the states of future quickly. Due to two reasons, it was a bit tricky to implement the simulator correctly. The first reason is that the binary organizers provided was wrong. The gravity from the moon didn't affect to our spaceship. Also, the gravity from the moon is calculated by g(t+1) = G*M/(p(t+1)-pm(t))^2 where p is the position of a body and pm is the position of the moon. I think pm(t) should have been pm(t+1). Anyway, with some reverse engineering works, I could implement the simulator correctly. It helped me a lot by its fast simulation.

With these three tools, I implemented my strategy, which was basically, kind of bruteforce.

First my simulator calculates the future positions of a target for 5000 seconds. The track should be an elliptic arc around the earth. And then, it checks if I have a chance to reach a point which is close enough to the arc with ideal Hoahman transfer.

As there are the gravity from the Moon and the program is running in discrete space, there should be some errors. Things are not ideal. Therefore, my program starts adjustment as the next step when it finds it can go close to the track of the target. It runs simulations for various initial velocities again and again. This process takes a lot of time. My fast simulator helped this process.

The last question is how to decide the next target from 10 targets. My approach was a greedy algorithm with a heuristics. If it finds a promissing plan for a target, it starts the journey except for two cases. One case is that it withdraws the plan if the plan takes too much fuel. Another case is that the target is too far. The latter exception is a performance optimization. It is considered before it starts the first step to reduce unnecessary calculations for too far targets (e.g., the debri around the moon isn't good candidate as the first target). I know this is not optimal, but it seemed it was acceptable enough.

So, as I described, I think my approach isn't the smartest way. but I guess the good point of my program was its robustness. I've heard that there are some teams whose program unfortunately crashed with some typical test cases and some teams' solution only work for the given test cases.

The performance might be also the key of my success. It seemed that there are some teams whose solutions are similar to mine. However, most of them don't have fast simulator and they just use VM to simulate the physics. Also, I think C++ helps me to write solid and fast code.

Thanks the organizers for the excellent contest!

Sunday, August 2, 2009

Symbolic Polyglot Quine

Recently, I wrote the following code:


This script is a Quine, so it outputs the program itself into stdout without file input. You can run this script by Ruby, Perl, and JavaScript. This script is a polyglot of these three languages. And, this script was written only with symbolic characters (!"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~). There should be no alphabets, numbers, whitespaces, and 8bit characters.

Perl is OK only with symbolic characters. You can convert any Perl scripts into symbolic style scripts using Acme::EyeDrops. This module is awesome.


Also, all JavaScript scripts can be converted into symbolic style. jjencode does the magic.


As for Ruby, Ruby 1.9 is perfect for symbolic programming. kurimura found the way to convert arbitrary Ruby scripts into symbolic format.


Unfortunately, I think Ruby 1.8's symbolic programming is very limited. For example, you cannot create a loop. As quine only require substitutions and outputs for stdout, which is doable with symbolic Ruby 1.8, my script can run with Ruby 1.8.

There are some other weird code like this:


Friday, June 5, 2009

TCC 0.9.25

I worked on porting TCC into x86-64 and the project has recently released a new version which contains my changes.


I confirmed that my changes pass the tests in TCC and successfully compile some applications such as link and Lua.

As there should be some bugs in the x86-64 support, I'll be really appreciated if you try this release with x86-64 and report bugs.

Regarding implementation... It was more difficult than I expected. Code generation for x86-64 was relatively straightforward because there was support for x86, which is similar to x86-64. The most difficult part was relocation related stuff. As distance of two addresses in 64bit address space can be larger than 32bit, some 32bit relative references need to be fixed by PLT and GOT. TCC's -run mode support was also tricky by the same reason.

Other things by which I was often confused were 64bit system itself. As much code of TCC depends on 32bit system, sometimes it casts pointers into int, which is sometimes incorrect on 64bit system. The following simple code would show the difficulty of 64bit system:

int main() {
printf("%p\n", malloc(10));
printf("%p\n", malloc(1000000));

This code produces outputs like


on my linux box. When we allocate small memory chunk from heap, it returns addresses which is smaller than INT_MAX. So, even if we cast the returned pointer into 32bit integer, it has no problem as long as we allocate small memory chunks. This hided many bugs from me. This kind of bugs only appears with big programs. It means that it's difficult to create minimal failure cases...

About Me

http://shinh.skr.jp/ (my website in Japanese)