Irken Kitties

Solving a Danish Defense Intelligence Puzzle

While I was browsing the Reverse Engineering sub on Reddit a few months ago, I came across a puzzle that the poster said came from a Danish newspaper. It consisted of a single fairly large image, with a small amount of x86 assembly on one side, and a large block of text on the other, formatted to display a question mark. So, having finally had the time to sit down and solve this recently, I thought I would do a writeup, explaining my thought processes along the way in the hopes someone can learn from it. Another goal here is to expose people to the majesty of Radare2, which is a Vim-like commandline reverse engineering tool that follows the principles of the Unix philosophy.

The Challenge

It looks like a CrackMe, or capture the flag exercise. The x86 assembly is clearly a virtual machine, and I assumed the block of text on the right would be a binary that runs on that virtual machine. I call the machine, for lack of a better name, Dan32, because as I later found out, it is a 32-bit virtual machine, and originates from Denmark.

The block of text on the right is base64 encoded, which is easy enough to to convert back into a binary file, but since it is an image, we can’t directly get at that block in a text format without doing some kind of optical character recognition. We can guess it is base64 encoded by the characters used, and really after you’ve seen a lot of base64, you can usually spot it pretty easily.

I tried a few online OCR services, which did not work, and since I had invested almost no time into this, I was ready to say the hell with it. I was not about to type all that base64 text into my text editor by hand.

I did end up solving this puzzle and creating tools to reverse engineer it, what follows is a detailed writeup, read on for more.

Note: If you are using a blocker such as Privacy Badger, like I do, I’ve noticed the terminal movie playback embeds from asciinema.org may be blocked by default. If you wish to see those in this post, you’ll have to toggle that domain to allow in your plugin, though you don’t need to accept cookies from that domain for it to work.

Getting the Base64 Text

Staring at it a bit longer, we notice certain characters in the base64 side are bolded, and if we go through and write down each bolded character, it spells out some nonsense MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo..

I thought, since we are looking at a massive bunch of base64, that maybe this was also base64 encoded. We can use a tool called rax2 which is a part of Radare2 in order to decode it like this:

1
2
$ rax2 -D MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo.
32hacker557zjkzi.onion

It’s a vanity .onion address on the TOR network. The site, which unfortunately is not online anymore, has downloads for both the assembly listing, and the base64, saving us from needing to worry about how to get those characters into our computer by hand.

My approach to these sorts of files, that might be malicious or not, is usually to use hexdump or a hex editor program to look at them before going any further. After doing this to the base64 file, I notice that it is full of ANSI terminal escape sequences, and that these ones are for positioning text at (x,y) coordinates, setting bolding etc. This is because if you were to cat the file to your terminal, it would reproduce the formatting seen in the image, with the question mark and all, which is pretty cute, and are actually required to put the text in the right order to be decoded.

Before I cat this to my terminal, I wrote a script to check each of the ANSI escape sequences to make sure they were only positional and style commands, and nothing weird or malicious. They turned out alright, so I printed it to my terminal and copy pasted the text into a file. Then I wrote another script to remove the end of line hyphens, join it all together, and base64 decode it, resulting in a binary file that I named disk.img

You can find the complete source code for all of the Radare2 plugins I wrote to solve this on my github.

The Virtual Machine

The provided x86 assembly for the virtual machine is bare bones, but it tells us everything we need to know to run this binary. The label OP_TABLE points to an enumeration of each opcode the VM supports, and the order, so we know the numeric value of the that op.

Assembly Listing

Some more information we learn from the given asm is

  • There are at least 64 registers in this machine
  • It must be a 32-bit machine
  • %define REG(r) [REGS + r * 4] Registers are 32-bits wide
  • %define PTR(p) [MEM + p] It requires some read/write memory space
  • lea esi [DISK + esi] It requires some read/write space to act as a disk
  • mov eax, [OP_TABLE + eax * 4] Every opcode is 4 bytes wide
  • cmov is the only way to do conditionals

Even after learning all that information, it’s incomplete, some of the opcodes are not given implementations, such as write, in, div, and the various sized load.x, store.x, and nor, to name a few. So we’ll need to look at what’s given, and implement those ourselves.

My Philosophy of Reversing

Here’s where a major part of my reverse engineering philosophy comes in, I don’t as a rule like to run random binaries given to me, especially in malware/crackme situations. If I take the VM’s assembly listing, complete the missing implementations, and run the mystery binary disk.img, I have literally no idea what it is capable of at this point. Worst case scenario is that binary knows about flaw in the given virtual machine, and exploits it for a VM escape onto my host system and starts doing shit.

I’m heavy on the static analysis side, but at this point I don’t have any debugger, or analysis tools that even understand this made up computer architecture. What I want to do, is use Radare2 to reverse engineer the binary, so I’m going to need to teach Radare2 about this file format, computer architecture, invent a textural assembly language, and so on. And that’s the real fun of this challenge for me, honestly, so that’s what I did. Radare2 allows you to write plugins to extend it, so it can understand any CPU, real or imagined, and simulate its running through ESIL (Evaluable Strings Intermediate Language).

Radare2 Plugins

The first Radare2 plugin to write, is the asm plugin. This plugin takes the 32-bit machine level opcodes and fills in a structure with information about that opcode, its arguments, and it provides a textual representation for viewing a disassembly listing.

In order to do this, we’ll write a plugin in C. The asm plugin’s main function has the following prototype

1
static int disassemble(RAsm *a, RAsmOp *op, ut8 *buf, ut64 len);

The parameters to disassemble are:

  • RAsm *a is the current assembler context
  • RAsmOp *op is the structure we need to fill in
  • ut8 *buf are the opcode bytes we are disassembling
  • ut64 len is the length of buf

The important fields of RAsmOp to fill in here, are buf_asm which holds the textual representation of the disassembled opcode, and size, the size of the opcode.

Looking at the provided x86 assembly code, we can see how to dismantle a 32-bit opcode into its constituent parts, remember all opcodes are 4 bytes long or 32-bits.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mov ebp, edx
shr ebp, 21
and ebp, 77o
mov esi, edx
shr esi, 15
and esi, 77o
mov edi, edx
shr edi, 9
and edi, 77o

mov eax, edx
shr eax, 27
mov eax, [OP_TABLE + eax * 4]
jmp eax

Becomes

1
2
3
4
5
6
7
8
9
10
11
12
#define SIX_BIT 077

//  Cast 4 bytes from buf to a unsigned 32 bit value
ut32 dword = *(ut32*)buf;

//  32 - 27 leaves a 5-bit opcode
ut8 op_index = (dword >> 27);

//  Extract 3 6-bit arguments using the SIX_BIT mask
ut8 edi = (dword >> 9 ) & SIX_BIT;
ut8 esi = (dword >> 15) & SIX_BIT;
ut8 ebp = (dword >> 21) & SIX_BIT;

Next, for convenience, we make a lookup table that maps 0 to 63 to the corresponding register name. I happen to know from the future, that r62 is the stack pointer, and r63 is the instruction pointer, but I didn’t know this at the time. It makes reading the disassembly a lot easier though once we know this.

1
2
3
4
5
6
7
8
9
10
11
12
static const struct {
    char *name;
} regs[] = {
  { "r00" }, { "r01"  }, { "r02" }, { "r03" }, { "r04" }, { "r05" }, { "r06" }, { "r07" },
  { "r08" }, { "r09"  }, { "r10" }, { "r11" }, { "r12" }, { "r13" }, { "r14" }, { "r15" },
  { "r16" }, { "r17"  }, { "r18" }, { "r19" }, { "r20" }, { "r21" }, { "r22" }, { "r23" },
  { "r24" }, { "r25"  }, { "r26" }, { "r27" }, { "r28" }, { "r29" }, { "r30" }, { "r31" },
  { "r32" }, { "r33"  }, { "r34" }, { "r35" }, { "r36" }, { "r37" }, { "r38" }, { "r39" },
  { "r40" }, { "r41"  }, { "r42" }, { "r43" }, { "r44" }, { "r45" }, { "r46" }, { "r47" },
  { "r48" }, { "r49"  }, { "r50" }, { "r51" }, { "r52" }, { "r53" }, { "r54" }, { "r55" },
  { "r56" }, { "r57"  }, { "r58" }, { "r59" }, { "r60" }, { "r61" }, { "esp" }, { "eip" }
};

Since in the disassembly output we’re going to be referencing things by register name a lot, I grab the textual names for each argument as well.

1
2
3
char *edi_reg = regs[edi].name;
char *esi_reg = regs[esi].name;
char *ebp_reg = regs[ebp].name;

What follows in the disassemble function is a switch statement on op_index, where we just need to fill in the op size and the textual representation of the opcode itself. So I’ll show a few of those here, you can see the full source of these plugins here

1
2
3
4
  case 11:
    snprintf(op->buf_asm, R_ASM_BUFSIZE, "nor %s, %s, %s", ebp_reg, esi_reg, edi_reg);
    op->size = 4;
    break;

So for example the nor instruction, which wasn’t provided in the image, just uses snprintf to write out our human readable disassembly, and sets the op->size = 4. This ends up producing something like nor r21, r57, r57.

Quickly taking a look at another example, movi is the move immediate value instruction, and looks like this:

1
2
3
4
5
6
7
8
9
10
11
case 16:
  eax = dword;
  ecx = dword;
  eax >>= 5;
  eax &= 0xffff;
  ecx &= 037;
  eax <<= (ecx & 0xff);

  snprintf(op->buf_asm, R_ASM_BUFSIZE, "movi %s, 0x%0x", ebp_reg, eax);
  op->size = 4;
  break;

Notice op->size = 4 for all instructions, and setting op->size = -1 indicates an invalid operation. The above movi instruction actually encodes an immediate value directly into the opcode itself. This is the only instruction which does this, all other instructions must move values into a register to operate on them. Again, this is a straight translation from from the given x86 asm.

Other instructions had to be put together just following the pattern that was set out for us. For example, div, mul, nor all work the same as the given mul opcode. All said, it is not a lot of work to get a fully functioning disassembler going in Radare2.

Here is the last part of the plugin, where we hook our code up by setting callbacks, and some information:

1
2
3
4
5
6
7
8
9
10
11
12
13
RAsmPlugin r_asm_plugin_dan32 = {
  .name = "dan32",
  .author = "safiire@irkenkitties.com",
  .license = "None",
  .desc = "Dan32 disassembler",
  .arch = "dan32",
  .bits = 32,
  .init = NULL,
  .fini = NULL,
  .disassemble = &disassemble,
  .modify = NULL,
  .assemble = NULL,
};

And here’s the result, a nice looking assembly readout that we can use to start reversing the binary.

Disassembly

Radare Analysis Plugin

With the above plugin we can now see human readable disassembly of the binary, but Radare doesn’t have enough information about this architecture yet to allow us to step through the program and simulate it like you would in a debugger. And you can’t yet perform static analysis like you would get with IDA. Radare supports about one zillion architectures already, but since this CPU was probably invented just for this challenge, we’ll have to add support ourselves.

Radare’s answer to this is ESIL, (Evaluable Strings Intermediate Language), providing a register profile for the CPU, and using those to create an analysis plugin. An analysis plugin expects us to implement a function like this, to set the register profile.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static int set_reg_profile(RAnal *anal) {
  const char *p =
  "=A0  r03\n"
  "=A1  r04\n"
  "=A2  r05\n"
  "=LR  r59\n"
  "=PC  r63\n"
  "=SP  r62\n"
  "gpr r00 .32   0 0\n gpr r01 .32   4 0\n gpr r02 .32 ... etc"
  "gpr r08 .32  32 0\n gpr r09 .32  36 0\n gpr r10 .32 ..."
  "gpr r16 .32  64 0\n gpr r17 .32  68 0\n gpr r18 .32 ..."
  "gpr r24 .32  96 0\n gpr r25 .32 100 0\n gpr r26 .32 ..."
  "gpr r32 .32 128 0\n gpr r33 .32 132 0\n gpr r34 .32 ..."
  "gpr r40 .32 160 0\n gpr r41 .32 164 0\n gpr r42 .32 ..."
  "gpr r48 .32 192 0\n gpr r49 .32 196 0\n gpr r50 .32 ..."
  "gpr r56 .32 224 0\n gpr r57 .32 228 0\n gpr r58 .32 ..."
  return r_reg_set_profile_string(anal->reg, p);
}

Here we specify all 64 general purpose registers in the machine, and also give aliases to registers that have special meaning. The format is gpr <registername> .<size in bits> <offset>.

With this we can create a register file containing registers of various sizes, which can overlap. For example in x86, we can specify register gpr ax .16 0, but also specify the high and low bytes as gpr ah .8 8 and gpr al .8 0.

Dan32 doesn’t have overlapping registers, or high and low register access by name, so we don’t need to do this.

Some register aliases are A0, A1, A2, which are for arguments that are passed to functions via register, which is pretty common in this binary. LR is the link register, which like on an ARM CPU holds the return address of a function, PC, is the instruction pointer, and SP is the stack pointer, so I’ve filled those in after having gotten some experience with the binary’s two calling conventions.

The next task for the analysis plugin is to create ESIL for each and every instruction supported by the CPU. There are not many instructions so this didn’t take very long.

The plugin must implement an analysis function with the following prototype, which looks extremely similar to the asm plugin function:

1
static int dan32_anal_op(RAnal *anal, RAnalOp *op, ut64 addr, const ut8 *data, int len);

Here, we’re asked to fill in more information about the opcode in the given RAnalOp *op parameter, it looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
struct RAnalOp {
  id       // I use the opcode index
  esil     // A string containing ESIL
  size     // Size of opcode, this is always 4 in dan32
  nopcode  // No idea, other plugins set this to 1
  addr     // Address this opcode is at
  jump     // If this opcode jumps somewhere, the address
  fail     // Where to jump on failure condition
  ptr      // Pointer to the primary data we're working with
  val      // Value of the primary data we're working with
  type     // enum of types such as R_ANAL_OP_TYPE_CJMP for conditional jump
  family   // enum of type family such as R_ANAL_OP_FAMILY_IO for I/O
}

All of these are pretty important for proper analysis, but the most important, so that we can simulate this binary inside radare2, without running it on the untrusted VM we were given, is the ESIL. Here is an example of ESIL for movi, the move immediate value instruction:

1
2
3
4
5
movi r57, 0x41

; becomes

0x41,r57,=

ESIL is a stack machine, turing complete, so it is able to represent the instructions for any CPU, it is like a ridiculous sort of microcode almost. A more complicated instruction cmov, the conditional move instruction, looks like this:

1
2
3
4
5
cmov eip, r57 if r23

;  becomes

r23,?,{,r57,eip,=,}

So after each instruction is codified by type and given an ESIL representation, we’re done. If you are interested in how ESIL works, here’s the docs. I’ve written some pretty crazy ESIL for the disk sector read/write code, and stack machines are not my favourite, but they work :) Here is some of the longest ESIL I wrote for one opcode. It reads 512 bytes from a numbered disk sector, into a given memory address.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
read [r57], sector(r21)

;  becomes

0x0,r40,=,
0x200,r21,*,
0x200000,+,
r40,+,
[8],
r57,r40,+,
=[8],
0x8,r40,+=,
0x200,r40,==,!,
?{,3,GOTO,}

Disassembling the binary in Radare2

Ok, so now we’re all set to get a disassembly view of this binary, we’ll just load it up in Radare2, hit play to see how it goes.

“Wrong Endianness”. Now, there are a few things going on here, so let’s look at the first instruction: movi r00, 0x78200. I don’t want to get to bogged down in the details, but I know from the future, that register r00 is like the zero register on a MIPS system, it always contains the value 0, and so here writing 0x78200 to that register, is effectively a no-op, and we’ll see why that’s done in the next part.

Next up we have movi eip, 0x14. There are no jump instructions in this opcode set, and unlike x86, you can write to the instruction pointer register to get a jump. Interestingly, jumping to 0x14 is not a multiple of 4, and so we’re jumping out of alignment, which is why we’re seeing the disassembler isn’t able to interpret a few instructions after that at first.

When we get to the address 0x14, we end up at series of instructions that loads immediates, and then uses out to print them out to the display.

1
2
3
4
5
6
7
8
9
10
11
0x00000014  e00a2087    movi r57, 'W'
0x00000018  000020cf    out r57
0x0000001c  21072087    movi r57, 'r'
0x00000020  000020cf    out r57
0x00000024  e00d2087    movi r57, 'o'
0x00000028  000020cf    out r57
0x0000002c  e1062087    movi r57, 'n'
0x00000030  000020cf    out r57
0x00000034  e00c2087    movi r57, 'g'
0x00000038  000020cf    out r57
0x0000003c  25002087    movi r57, ' '

A little bit of radare knowledge, the immediate values were displayed as hex to begin with, so I wrote a little radare expression to hint to it that those immediates are actually string or char values using the ahi command, which stands for “analyse hint immediate”.

Radare2 is terse as hell, and you get very used to it, and probably, maybe, start loving it. The expression below basically creates a range from the current address, denoted as $$, to $$ + 17 * 8 with a step of 8 bytes. The @@= functions as an iterator, which runs the command ahi s on each address in the range, telling radare the immediate values are character values.

1
ahi s @@=`?s $$ $$+17*8 8`

Anyway, the real problem is we’re interpreting the binary as little endian, when it’s actually big endian. So we can just go back to our plugin and fix that pretty simply in the disassemble function by reversing the bytes, and setting the endian properly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//  Decode the op
ut8 big_end[4];
big_end[0] = buf[3];
big_end[1] = buf[2];
big_end[2] = buf[1];
big_end[3] = buf[0];

ut32 dword = *(ut32*)big_end;
ut8 op_index = (dword >> 27);       // 5-bit opcode

// ...

RAsmPlugin r_asm_plugin_dan32 = {
  // ...

  .endian = R_SYS_ENDIAN_BIG,

  // ...
};

Correct Endian

Back to the entrypoint of our binary, what was once a no-op, when read backwards, jumps us past all the “Wrong Endian” stuff, and begins displaying the binary properly so we can reverse it.

The opcode 87e00180 when read in big endian jumps us with movi eip, 0xc, and another jump movi eip, 0xa8, bringing us finally to some actual code.

1
2
3
4
5
6
7
0x000000a8   movi r20, 0xc0
0x000000ac   movi r21, 0x14c
0x000000b0   movi r22, 0xbe
0x000000b4   store.b [r22], r00
0x000000b8   read [r00], sector(r00)
0x000000bc   goto r21
;  print "Disk read error!"

This is the first in a series of tricks and tests that the binary performs on the virtual machine itself to make sure it is implemented properly. With the use of a bin plugin for dan32, which I’m not going to bore you with here but is available on GitHub with the rest of the code, I’ve tried to setup a memory layout that would be familiar for someone like myself who works with ELF or PE files. Here is that layout.

1
2
3
4
5
6
7
00  0x00000000 |#-------------------------| 0x0000092b  2.3K mrwx .text
01  0x0000092c |#-------------------------| 0x00000b79   589 mrw- .bss
02* 0x00000c00 |#-------------------------| 0x00007503 26.3K mrw- .encrypted
03  0x000ffc00 |------------#-------------| 0x00100000    1K mrw- .stack
04  0x00200000 |-------------------------#| 0x00207503 29.3K mrw- .diskrom
05  0x00100000 |------------#############-| 0x001f0000  960K -rwx esil.ram
=>  0x00001dc0 |--------------------------| 0x00001ec0

Remembering the DISK address that was mentioned in the x86 assembly VM, which is meant to represent a readable writeable disk area from which the program is loaded. This area stores disk.img in the .diskrom section at address 0x200000. I probably shouldn’t have called it a diskrom, since you can write to it, but I didn’t know it was going to be written to at the time, so it’s too late now. I actually believed it was going to be something like a game cartridge rom at first, but oh well.

The code is executed from the .text section, with entrypoint 0x0, and we have a .bss section which contains some initialized data that is used in the program. The read and write instructions are used to copy data from the disk into memory by 0x200 byte sectors.

So the trick here, is that at 0xb4 we are writing from r00, which always contains zero to address 0xbe, which is altering an instruction. Then, if your read instruction works properly this is immediately corrected by reloading the entire first sector from .diskrom back into memory, undoing the damage. If your read instruction is not working, you will be greeted by the text “Disk read error!” and the program will halt.

Notice how the analysis plugin is working, showing beautiful ascii arrows that point to the destinations of our jumps. When the zero is written to address 0xbe, it modifies the instruction, and we see the control flow is taking us directly towards “Disk read error” and a halt. The read immediately fixes this and the control flow updates.

The Next Test

Next we move on to 0x14c, which is an area of the binary that sets up the stack pointer, and reads the rest of the program from .diskrom one sector at a time.

I guess here we can get our first look at how dan32 goes about things.

1
2
3
4
5
6
7
8
;-- fcn_read_other_sectors:
0x0000014c      add r61, eip, r00
0x00000150      movi r57, 0x4
0x00000154      nor r57, r57, r57
0x00000158      add r61, r61, r57
0x0000015c      movi r57, 0x1
0x00000160      add r61, r61, r57
0x00000164      movi esp, sym.stack_end

Here’s some things to take note of right off the bat:

  • There is no subtraction instruction
  • There is no concept of signed numbers
  • We can still have negative numbers using 2’s complement
  • There are no other bitwise logical operations besides NOR
  • NOR is universal, and can construct any other gate
  • There is no direct way to move one register into another one, so addition + 0 is used
  • nor(a, a) flips all the bits of a
  • nor(a, a) is equivalent to -(a + 1) in 2’s complement
  • r00 is a hardware zero register
  • r57 is used as a temporary register

There’s a few patterns we see using NOR throughout this binary. Above we want to save the instruction pointer to r61, and then subtract 4 from it. This is done many times in this binary like this:

1
2
3
4
5
6
add r61, eip, r00     ;  r61 = eip
movi r57, 0x4         ;  r57 = 4
nor r57, r57, r57     ;  r57 = -5
add r67, r61, r57     ;  r61 = r61 - 5
movi r57, 0x1         ;  r57 = 1
add r61, r61, r57     ;  r61 = r61 + 1

This is a roundabout way of just saying r61 = eip - 4, but that’s what we’re dealing with here :)

The next test the binary performs on the virtual machine, is to test the div instruction. Since this instruction was not provided in the x86 assembly code, it is to ensure we’ve got the argument order right, and we’re not allowing division by zero. If we’ve done it wrong, we’re sent off to some code that prints “ALU Malfunction (DIV)” and halts the program.

By the way, these symbols such as fcn.alu_malfunction_div, fcn.main, and so on, were added by me while reversing the binary to make it more clear what is going on.

1
2
3
4
5
6
7
nor r20, r00, r00                       ; r20 = -1
div r20, r00, r20                       ; r20 = 0 / -1
movi r57, fcn.alu_malfunction_div       ;
add r57, r57, r00                       ; does nothing
cmov eip, r57 if r20                    ; if(r20) goto fcn.alu_malfunction_div
movi r57, fcn.main                      ;
goto r57                                ; goto main()

That’s also our first look at conditionals in dan32. There are no compare instructions, there is no zero flag, and no conditional jumps like jne, as you find in other instruction sets.

An interesting side note I guess, is that there is no real compare instruction in x86 either, the cmp instruction on that processor is actually an alias for subtracting the two values. When they are equal, since a - a = 0, this sets the zero flag, which is what instructions like jne are conditional on.

The Main Function

Now that the Radare plugins are working, let’s have a look around the binary, simulate it a bit, and look around. This is loading a project file where I’ve already reversed the entire binary, but gives a good idea of how it’s working.

Here’s where we start getting some idea of what this binary is up to, and finally get to see some proper functions such as print(), scan(), memcmp() and things like that implemented.

1
2
3
4
5
6
;-- fcn.main:
0x0000028c      movi r03, str.password
0x00000290      movi r57, 0x8
0x00000294      add r59, eip, r57
0x00000298      movi r57, fcn.print
0x0000029c      goto r57

First up, radare has identified a string “Password: “, and helpfully renamed its address as the symbol str.password for us. Here we can see one of the two calling conventions in action. This one is a lot like fastcall, where we load the first few arguments of a function into registers r03, r04, r05, and end up with our return value in r01.

Remember I identified r59 as the link register, and that is used as our return value. So here, the calling convention is to calculate the return address, eip + 8, two instructions away, and store it into r59, we then load the address of the print function fcn.print into a temp register, and jump there.

Throughout the binary, r57 is always used as a temporary register. There are others such as r20, r21 which are always used as counters or array indices. In fact, this assembly code is so consistent in the way it does things and which registers it uses, that I wonder if it was emitted by a machine, or written by hand by someone who is just awesome.

Now that we know our arguments, our return address, and where we’re going, that about fully describes this calling convention. There is also a stack based calling convention like you would find on x86 32-bit, which I may write about later.

The Print Function

So don’t worry I’m not going to bore you to death by literally explaining every function, but this print one is a fairly simple example to start with.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
;-- fcn.print:

;  Initialize r20 to 0 as an array index
add r20, r00, r00
movi r57, lbl.head
goto r57

lbl.body:
   ;  Output the character in r21
   out r21
   ;  r20 += 1
   movi r57, 0x1
   add r20, r20, r57

lbl.head:
   ;  r21 = r03[r20]
   load.b r21, [r03 + r20]
   movi r57, lbl.body
   add r57, r57, r00

;  return if r21 == 0
cmov eip, r57 if r21
goto r59

This function is simple, but normally to reverse a difficult function, I will slowly replace elements of the disassembly with C, until I have a C function. In this case, we’d have a for loop:

1
2
3
4
5
void print(char *r03){
  for(r20 = 0; r03[r20] != 0; r20++){
    out r03[r20]
  }
}

The Purpose of the Program

I’ve kept from mentioning the actual purpose of this program for way too much of this article. If written in C, the main function would just about look like the following code. This was figured out by reversing each function in turn, and I got a happy surprise at the end, we’re going to be dealing with encryption.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void main(){
  printf("Password: ");
  chars_read = scan(passphrase_input);

  if(chars_read == 0){
    print_eh_halt();
  }

  print("Initializing Encryption...");
  burn_cpu();

  rc4_key_schedule(key_schedule, passphrase_input, chars_read);
  print("Ok");
  print("Checking Key");
  burn_cpu();

  decrypt(key_schedule, encrypted_test_data, 0x38);
  result = memcmp(encrypted_test_data, correct_decryption, 0x38);

  if(result != 0){
    print_bad_key_halt();
  }

  print("Ok");

  rc4_key_schedule(key_schedule, passphrase_input, chars_read);
  print("Decrypting Disk Image");

  decrypt_disk();
}

So what’s going on here, is we’re going to do an in-place decryption of the DISK section if we’ve entered the right passphrase.

We’re able to figure all this out, without running this binary at all, through static analysis and a bit of ESIL simulation. I guess the question is how did I know which function was doing the decryption, what actual encryption algorithm was being used, and how I’m going to figure out the passphrase without even running the binary.

The answer to the first question is that I knew I would probably need to find XOR somewhere in this program which would XOR the ciphertext with the key stream, but since we have no XOR instruction, I knew it would need to be created with a group of NOR, which I found pretty easily.

XOR using NOR Gates

So I spotted that pretty easily, and pinpointed the main decryption routine.

1
2
3
4
5
nor r58, r21, r01
nor r21, r58, r21
nor r57, r58, r01
nor r21, r21, r57
nor r21, r21, r21

The answer to how did I know which encryption algorithm it was, is more funny. I didn’t know which one it was. I had stepped through the key scheduling function a few times, after it prints “Initializing Encryption”, and thought that it was basically key stretching the passphrase. It was only later when I was randomly reading through a writeup on some malware which used RC4 encryption, that I realized what I was looking at the same RC4 key scheduling algorithm.

Cracking the Passphrase

The main encrypted part of the binary was identified by the address that was being passed to the decrypt() function, which was 0xc00. I also previously noticed this while running the binary through the entropy function of the program binwalk. Here is the output from binwalk -E

1
2
3
4
5
6
$ binwalk -E disk.img

DECIMAL       HEXADECIMAL     ENTROPY
--------------------------------------------------------------------------------
0             0x0             Falling entropy edge (0.553052)
3072          0xC00           Rising entropy edge (0.972975)

One of the weaknesses of some encryption schemes, is in how it checks if the passphrase is valid before decryption. Say for example you enter a passphrase on a zip file or something, and the unzip program just blindly decrypts the file without knowing if the password is valid. It’s going to produce total garbage if the passphrase is wrong, and the program won’t have any way of letting you know that you’ve entered the wrong password.

So a common, bad, way to verify the password first, is to have some known ciphertext, plaintext pair that is encrypted using the passphrase right in the binary. You enter the passphrase, it decrypts this small ciphertext, compares it to the known plaintext, and if it’s correct, it says “yay” and moves on to decrypting the rest of the file. If it’s wrong, it says “boo”, and doesn’t decrypt the file into garbage.

This is what’s happening in our dan32 binary. The known ciphertext, plaintext pair is included, meaning we just have to crack that.

Here we can see that before decrypting the DISK section, it tries to decrypt a small 56 byte buffer, and then compares that to a valid string that is included in the program.

1
2
3
4
5
6
  decrypt(key_schedule, encrypted_test_data, 0x38);
  result = memcmp(encrypted_test_data, correct_decryption, 0x38);

  if(result != 0){
    print_bad_key_halt();
  }

Here we can output the short encrypted buffer, and its valid decryption. If the passphrase we give doesn’t decrypt this short buffer correctly, the program will halt. Here I use the px Radare command to do a hexdump of the test ciphertext, and another ps command to print the plaintext string.

1
2
3
4
5
6
7
8
9
10
11
px 0x38 @ sym.another_buffer_i_think

- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000964  7216 fa85 4c3c d0e4 5905 7954 d203 eb95  r...L<..Y.yT....
0x00000974  1601 e73b 6dc8 642f 742f 5419 aabe ea31  ...;m.d/t/T....1
0x00000984  9306 c9e1 fa65 830f 5118 a727 94ff 9634  .....e..Q..'...4
0x00000994  5af7 4c29 85de 8714                      Z.L)....

ps 0x38 @ 0x0000092c

Another one got caught today, it's all over the papers.

That means in order to crack this passphrase, I only need to figure out how to successfully decrypt this 56 byte buffer, which is something I can do entirely outside of this binary. I decided to reverse the key scheduling and decryption routine, and rewrite it in C so that I could brute force the password quickly outside of this environment, and outside of Radare.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//  Here is the reversed RC4 key schedule from Dan32
void rc4_key_schedule(char *key_schedule, char *key_input, int length){
  char *ptr = (key_schedule + 2);
  for(int i = 0; i < 0xff; i++){
    ptr[i] = i;
  }

  int j = 0;
  for(int i = 0; i < 0xff; i++){
    j = (j + ptr[i] + key_input[i % length]) & 0xff;
    char t1, t2;
    t1 = ptr[i];
    t2 = ptr[j];
    ptr[i] = t2;
    ptr[j] = t1;
  }
  key_schedule[0] = 0;
  key_schedule[1] = 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
//  Here is the PRGA from Dan32
char pseudo_random_generation_algorithm(char *key_schedule){
  int i = key_schedule[0];
  int j = key_schedule[1];

  char *s = zero_gap + 2;

  i = (i + 1) & 0xff;
  j = (j + s[i]) & 0xff;

  key_schedule[0] = i;
  key_schedule[1] = j;

  char t1, t2;
  t1 = s[i];
  t2 = s[j];
  s[i] = t2;
  s[j] = t1;

  return s[ (s[i] + s[j]) & 0xff ];
}

And finally the decrypt routine.

1
2
3
4
5
6
7
8
//  RC4 Decrypt from Dan32
void decrypt(char *key_stream, char *cipher_text, int length){
  for(int i = 0; i < length; i++){
    char key = pseudo_random_generation_algorithm(key_stream);

    cipher_text[i] = cipher_text[i] ^ key;
  }
}

I made this into a complete program, that takes a passphrase as argument, and then wrapped it in a short Ruby script that repeatedly tries passwords from a list I have, until the decrypted result matches. This only took about 3 minutes, the passphrase ended up being agent This was lucky, because it could have been a lot harder if it wasn’t a simple word, I would have needed to use hashcat or something a bit more sophisticated. There are also flaws with RC4 itself, which directly relate to the problems with WEP, but I didn’t need to go that route.

I can’t really call writing this code a waste of time, since I ended up needing to do it in order to actually identify the algorithm, but there are far easier ways to decrypt RC4 that I could have used, for example, Radare2 comes with a program called rahash2 which can, among about a zillion other things, be used to decrypt RC4.

1
2
3
4
5
6
7
8
9
10
11
$ hexdump -C test.bin

00000000  72 16 fa 85 4c 3c d0 e4  59 05 79 54 d2 03 eb 95  |r...L<..Y.yT....|
00000010  16 01 e7 3b 6d c8 64 2f  74 2f 54 19 aa be ea 31  |...;m.d/t/T....1|
00000020  93 06 c9 e1 fa 65 83 0f  51 18 a7 27 94 ff 96 34  |.....e..Q..'...4|
00000030  5a f7 4c 29 85 de 87 14                           |Z.L)....|
00000038

$ rahash2 -D rc4 -S s:agent test.bin

Another one got caught today, it's all over the papers.

Decrypting the Disk Image

At this point I am thinking I will just dump the high entropy section of the binary from 0xc00 onwards out to a separate file, and decrypt it with rahash2 and be done with it, but when I try this, I end up with unintelligible garbage, that isn’t proper dan32 code, and isn’t anything else I can recognize.

The DISK section is divided into 512 byte sectors, and it turns out they are not decrypted in the order they appear in the file. The order of the pseudorandomly generated keystream matters since it’s a stream cipher, and so that is why I’m getting garbage out. I decided then to just simulate the decryption process within Radare using ESIL, since I put so much work into properly defining each opcode in ESIL, it does simulate the VM perfectly.

The only problem is, that I have not implemented the in and out opcodes for doing IO, so I would be running the program blind, and be unable to enter the passphrase or see printed output.

Fixing the IO Problem

An easy way to avoid writing the in opcode, is for me to simulate the program up until it is about to ask me for a passphrase, stop there, and just write the passphrase into memory at the right address, and skip over the scan() function entirely and continue afterwards, so that’s what I’ve decided to do.

For the out opcode, there is a less hacky solution. I can use ESIL to simulate an interrupt, and attach that interrupt to an external program that will receive the character value to be printed. I wrote a short Ruby script which accepts an argument and prints it to the standard output. And inside Radare2 simulate the binary like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#  Setup ESIL
s 0
aei
aeim
aeip

#  Allow the binary to virtually write into a memory cache
e io.cache = true

#  And attach out.rb as an interrupt handler
e cmd.esil.intr=!./out.rb


#  Advance to right before we call scan to ask for password
aecu 0x000002ac


#  Write the RC4 password into the right place in memory
#  The length is written to the return value register

wz agent @ 0x1200
aer r01 = 0x5


#  Skip over the call to scan(), and continue from there
#  r63 is the instruction pointer

aer r63 = 0x000002b4
aecu 0x000004fc

The Result

This takes a while to complete, so I just went off and did something else. Another thing, not shown here is that the binary often calls functions that just do nothing but waste enormous amounts of time counting, which have no effect on the output. I patched these calls out of the binary so I wouldn’t have to wait 2000 years for it to finish.

Once all is said and done, the binary has completely rewritten the DISK section into yet another binary, and we’re given this message:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ # Look, I'm an HTTP server now!
$ cat xinetd.conf
service httpd
{
    disable     = no
    socket_type = stream
    protocol    = tcp
    wait        = no
    bind        = 0.0.0.0
    server      = u5emu
    server_args = disk.img
    type        = UNLISTED
    port        = 80
    user        = root
}

I dump the DISK segment into an actual file, and reload that in Radare, and sure enough it has decrypted itself into a webserver written in dan32. I reversed this new binary for a while, and found it contained:

  • Embedded html files
  • A GIF of Morpheus from the Matrix
  • A PNG background file
  • Some lzma compressed issues of Phrack Magazine
  • The Hacker’s Manifesto, by the Mentor
  • Some zlib compressed files
  • And the finally, the flag, written in Danish

I used binwalk to extract these from the binary and looked through the contents. I got the flag, so I’m calling this one done. Good experience overall, 10/10 would crack again. I’m so proficient in reading dan32 assembly now, that it’s a shame I’ll probably never have any use for it again, it’s a pretty nice VM.