Crackme 21: JCWasmx86’s cm001

After a break of a few months where I pursued other projects, I was missing a good RE challenge so looked for new submissions on crackmes.one. I found JCWasmx86’s cm001 crackme which had good ratings, so decided to give it a try. As usual, the binary is mirrored here.

The description says it’s written in C and assembly, and that it runs on Linux. The goal is to “find one of the valid passwords. (There are many possible ones)”.

Let’s start by running the binary:

$ ./cm001
Input:
hello
Failed to validate password
$ echo $?
1

strings reveals a promising message:

$ strings ./cm001 | grep password
You found the password:
Failed to validate password

Reading the input in main

Jumping into disassembly, we can easily find main since this binary seems to have most if not all of its symbols intact. In main, a 65-byte buffer is filled with null bytes and used to receive user input using fgets:

004010b1  pxor    xmm0, xmm0                    ; erases the 128-bit xmm0 register
004010b5  lea     rdi, [rel data_40202c]        ; loads a string
004010bc  sub     rsp, 0x50                     ; reserves some stack space
004010c0  movaps  xmmword [rsp {var_58}], xmm0  ; moves these zero'd 128 bits to var_58
004010c4  mov     rbp, rsp {var_58}
004010c7  movaps  xmmword [rsp+0x10 {var_48}], xmm0 ; erases the next 16 bytes
004010cc  movaps  xmmword [rsp+0x20 {var_38}], xmm0 ; and 16 more
004010d1  movaps  xmmword [rsp+0x30 {var_28}], xmm0 ; and 16 more
004010d6  mov     byte [rsp+0x40 {var_18}], 0x0     ; as well as a terminating null byte
004010db  call    puts                              ; prints "Input:"
004010e0  mov     rdx, qword [rel stdin]    ; arg to fgets: read from stdin
004010e7  mov     esi, 0x41                 ; fgets will read up to 65-1 bytes
004010ec  mov     rdi, rbp {var_58}         ; into var_58
004010ef  call    fgets
004010f4  mov     rdi, rbp {var_58}         ; then use var_58
004010f7  call    strlen                    ; as an argument to strlen
004010fc  mov     rdi, rbp {var_58}
004010ff  mov     esi, eax                  ; eax contains the result from strlen
00401101  call    sub_40152b                ; jump somewhere else

Note the use of the SSE register xmm0, which is 16 bytes or 128 bits wide. The sequence above corresponds approximately to:

char var_58[65] = {0};
puts("Input:");
fgets(var_58, sizeof(var_58), stdin);
size_t eax = strlen(var_58);
sub_40152b(var_58, eax);

A first input constraint on xmm0

Let’s continue in sub_40152b. I assumed above that it took 2 parameters, since the user input is placed in rdi and the length in esi. These are the first 2 registers used to pass parameters in the System V AMD64 ABI calling convention.

This function starts with an immediate comparison of the input length and 0x40, jumping over a large block of code if the two don’t match. We can probably infer that our password needs to be 64 characters long:

0040152b  cmp    esi, 0x40                ; compare length and 0x40
0040152e  mov    qword [data_4040a0], rdi ; also store a pointer to our input at this address
00401536  jne    sub_40168a               ; no match: jump way down

Right away, we start unpacking the input into SSE registers, from xmm0 to xmm3. That’s 4 registers of 16 bytes each, covering the full 64 bytes of expected input:

0040153c  movupd  xmm0, xmmword [rdi]       ; first 16 bytes
00401540  movupd  xmm1, xmmword [rdi+0x10]  ; next 16…
00401545  movupd  xmm2, xmmword [rdi+0x20]  ; etc.
0040154a  movupd  xmm3, xmmword [rdi+0x30]

A long sequence of XOR operations follows, one over 16 bytes and the rest over single bytes. Its starts with xmm0 being XOR’d with the 128-bit xmmword at data_404080, which points to the string "dG46rskj8-457(~:". After this initial wide XOR, the program extracts single bytes from the xmm0 register with pextrb (extracts a single byte at the given offset), and using eax as an XOR “accumulator” combines the extracted bytes together:

0040154f  pxor    xmm0, xmmword [data_404080] ; XORs this "secret" string with our first 16 bytes
00401558  xor     eax, eax        ; resets eax
0040155a  pextrb  r8d, xmm0, 0x0  ; r8d <- input[0] ^ secret[0]
00401561  xor     eax, r8d        ; eax ^= r8d
00401564  pextrb  r8d, xmm0, 0x1  ; r8d <- input[1] ^ secret[1]
0040156b  xor     eax, r8d        ; eax ^= r8d
0040156e  pextrb  r8d, xmm0, 0x2  ; ... the same
00401575  xor     eax, r8d
00401578  pextrb  r8d, xmm0, 0x3
0040157f  xor     eax, r8d
00401582  pextrb  r8d, xmm0, 0x4
00401589  xor     eax, r8d
0040158c  pextrb  r8d, xmm0, 0x5
00401593  xor     eax, r8d
00401596  pextrb  r8d, xmm0, 0x6
0040159d  xor     eax, r8d
004015a0  pextrb  r8d, xmm0, 0x7
004015a7  xor     eax, r8d
004015aa  pextrb  r8d, xmm0, 0x8
004015b1  xor     eax, r8d
004015b4  pextrb  r8d, xmm0, 0x9
004015bb  xor     eax, r8d
004015be  pextrb  r8d, xmm0, 0xa
004015c5  xor     eax, r8d
004015c8  pextrb  r8d, xmm0, 0xb
004015cf  xor     eax, r8d
004015d2  pextrb  r8d, xmm0, 0xc
004015d9  xor     eax, r8d
004015dc  pextrb  r8d, xmm0, 0xd
004015e3  xor     eax, r8d
004015e6  pextrb  r8d, xmm0, 0xe
004015ed  xor     eax, r8d
004015f0  pextrb  r8d, xmm0, 0xf  ; all the way to the 16th byte
004015f7  xor     eax, r8d        ; all XOR'd together
004015fa  cmp     eax, 0x0        ; and if all of that is not zero
004015fd  jne     sub_40168a      ; then jump somewhere.

To think about these bytes, we have to consider the endianness of the plaform: with input[0:16] XOR dG46rs… stored in xmm0, using pextrb to extract a byte at a specific offset will select that byte starting from the least significant one. Here, the instruction at 0x0040155a that pulls byte zero with pextrb r8d, xmm0, 0x0 is setting r8d to input[0] XOR secret[0].

A quick look at sub_40168a tells us what we suspected: it prints an error message and exits:

0040168a  mov     rax, 0x1   ; system call 1: sys_write
00401691  mov     rdi, 0x1   ; fd=1, that's stdout
00401698  mov     rsi, data_404060  {"Failed to validate password\n"}
0040169f  mov     rdx, 0x1c  ; 0x1c is the message length, 28 bytes
004016a6  syscall            ; prints it
004016a8  mov     rax, 0x3c  ; syscall 0x3c is sys_exit
004016af  mov     rdi, 0x1   ; with status code 1.
004016b6  syscall            ; bye

From this point, I’ve renamed sub_40168a to print_failed_to_validate_and_exit to make it more visible.

Now that we have our first constraint to apply to the first 16 bytes of input, let’s continue back after the jne that started our detour.

A second constraint, now on xmm1

00401603  mov     rax, 0x11         ; rax is set to 0x11
0040160a  add     rax, rax          ; doubled
0040160d  xor     r8, r8            ; r8 is reset to zero
00401610  pextrb  r8d, xmm0, 0xb    ; r8d <- input[11] ^ secret[11]
00401617  imul    rax, r8           ; rax <- rax * r8
0040161b  add     rax, rax          ; rax <- rax * 2
0040161e  add     rax, rax          ; rax <- rax * 2
00401621  add     rax, 0x13a        ; rax <- rax + 0x13a
00401627  movupd  xmmword [data_4040b0], xmm1   ; copies xmm1 into this static variable
00401630  xor     rcx, rcx          ; resets rcx (used below as a counter)
00401633  mov     r8, 0x4040b0      ; r8 <- 0x4040b0 (the address of the variable we just wrote xmm1 to)
0040163a  xor     rbx, rbx          ; resets rbx

0040163d  mov     bl, byte [r8+rcx] ; takes one byte from xmm1[counter] <――╮
00401641  sub     rax, rbx          ; removes it from rax                  │
00401644  inc     rcx               ; increments the counter               │
00401647  cmp     ecx, 0x10         ; if not equal to 16                   │
0040164a  jne     0x40163d          ; loop back up  ―――――――――――――――――――――――╯

0040164c  cmp     rax, 0x0
00401650  jne     print_failed_to_validate_and_exit

Side-note: r8d and r8b are to r8 what eax and al are to rax.

To recap in pseudo-code:

// we start by computing a value based on input[11] ^ secret[11]
rax = 0x11 * 2;
r8d = input[11] ^ secret[11];
rax *= r8;  // multiplied
rax += rax; // doubled
rax += rax; // doubled again
rax += 0x13a;
// equivalent to: rax = (0x88 * (input[11] ^ secret[11]) + 0x13a)

ptr = xmm1 (== &input[16]);
counter = 0;
do {    // for each of the next 16 bytes (xmm1 goes from 16 to 31)
    r8 = ptr;
    rbx = 0;
    bl = ptr[counter];  // we take byte[i]
    rax -= bl;          // and remove it from the computed value rax
    counter++;
} while (counter != 16);

if (rax != 0) { // they must match
    print_failed_to_validate_and_exit();
}

This gives us another constraint on the input.

Escaping the error message

Looking at the instructions below this point, we start to notice an impasse: the jumps to print_failed_to_validate_and_exit were just jumps, not function calls. And we’re fast approaching this location, in fact there are no conditional jumps between the last jne print_failed_to_validate_and_exit we saw at 0x00401650 and the error message print itself. It almost feels like we went down the wrong path, since we clearly have no way out.

Here’s where we are, in context:

current state: always falling through to the error message

With seemingly no way to avoid the error message, let’s take a closer look at the few instructions ahead of us. Starting at 0x00401652:

00401652  movq    rax, xmm3                 ; movq takes the lower 64 bits from xmm3
00401657  pextrq  rbx, xmm2, 0x1            ; extracts 64 bits from byte 1 of xmm2, into rbx
0040165e  xor     rax, rbx                  ; xors the two together, stores the result in rax
00401661  mov     rbx, 0xffff231203         ; puts this long constant into rbx (no clear reason for this)
0040166b  pextrq  r8, xmm3, 0x1             ; extracts 64 bits again from offset 1 in xmm3 into r8
00401672  movq    r9, xmm2                  ; and the lower 64 bits of xmm2 into r9
00401677  mov     qword [data_4040b0], r9   ; saves r9 to a static variable
0040167f  mov     qword [data_4040a8], r8   ; and r8 to another
00401687  div     rbx                       ; *divides* (rdx:rax) by rbx(!)

    ; falls through into print_failed_to_validate_and_exit

There’s clearly no jump here, we just continue on to the error message. The only way I can think of that would cause us to bail from these few instructions would be for one of them to trigger a fault of some kind. One approach would be to access memory that’s unreachable, but this code doesn’t compute any pointers to write to or read from.

Our only remaining option is with the last instruction div rbx, which divides rdx:rax by rbx; we could try to trigger a SIGFPE (arithmetic exception) by manipulating rdx and/or rax so that it ends up overflowing. Despite its name meaning “signal floating-point error”, SIGFPE can be thrown here during this integer division, in one of two cases (Intel reference manual, Vol. 2A 3-317 “Unsigned Divide”):

  1. If the denominator is zero, which we know is impossible here since we have a mov rbx, 0xffff231203 just above.
  2. If the result of the division doesn’t fit in 64 bits. Keep in mind that it’s not just rax that’s being divided by rbx, but rdx:rax. This syntax describes the concatenation over 128 bits of registers rdx and rax to create the numerator.
    For example, if we have rdx = 0x1234, rax = 0x5678, and rbx = 0xabcdef, what we’re computing is ((rdx << 64) | rax) / rbx, or 0x12340000000000005678 / 0xabcdef which produces the quotient 0x1b1fb592b526e1 in rax and the remainder 0x73dd69 in rdx. There’s no error since the quotient fits in 64 bits.
    But if we have a larger value in rdx:rax and the quotient is too large to write to rax, for example if we kept the same values except for rdx = 0xffffffff, the quotient would be 0x17d74fd494920f78247 which clearly doesn’t fit in 64 bits. This would trigger a SIGFPE signal.

Let’s see what this means about our input. Even if we somehow managed to make rax as large as possible (all bits set), the division by the constant in rbx would still succeed if rdx was left to zero. We can try to find some instructions that manipulate bits of rdx based on our input, and set enough bits that rdx:rax becomes large enough before the division. But going back to the caller, there really aren’t that many places that touch rdx; the only one I could find was at 0x04010e0 with a mov rdx, qword [stdin] but this is just setting up a parameter for fgets.

A quick side note about pextrq: here the offset is a qword offset and not a byte offset as in the sequence of pextrb instructions above. The documentation for PEXTRQ r/m64, xmm2, imm8 says: “Extract a qword integer value from xmm2 at the source qword offset specified by imm8 into r64”. With a source qword offset of 1, we copy the high (most significant) 64 bits of xmm2; if it was zero it would be the low (least significant) 64 bits.

A register left to chance

It doesn’t seem to me that there is any way to control the value of rdx from the input. While investigating this issue I noticed a difference between the behavior of the program when I ran it within a container vs directly in a virtual machine. Both systems were x86-64 running Debian 11 “Bullseye”, with the same password input. I set a breakpoint on the div instruction at 0x401687 with b *0x401687 in GDB, and looked up the registry values at that time. In the container I had a small value for rdx:

$rax   : 0x0
$rbx   : 0xffff231203
$rcx   : 0x10
$rdx   : 0x30

while in the VM it was a much larger number:

$rax   : 0x0
$rbx   : 0xffff231203
$rcx   : 0x10
$rdx   : 0x7fffffffdb70

With such a large value in rdx, the result of the division by rbx does not fit in 64 bits:

>>> rdx = 0x7fffffffdb70
>>> rax = 0
>>> rbx = 0xffff231203
>>> hex(((rdx << 64) | rax) // rbx)
'0x80006e77394526c345'

From what I can tell, it looks like whether the input is accepted as valid depends on an assumption regarding the value of rdx once we get to the div rbx instruction. Without a way to influence rdx from our inputs, this means that the validity of the password is likely platform-dependent.

In my case, I had started looking at this div with assumptions based on a misunderstanding: I mistakenly thought that the div rbx instruction would divide by rax instead of dividing (rdx:rax). This led me to try to set rax to zero, in order to divide by zero and trigger the signal. This is actually unnecessary, but for the record this is how I went about it:

<unnecessary-detour>

Here’s what a 128-bit SSE register would contain for the given input "hello, RE world!", as well as which parts are read by pextrb and pextrq commands:

hex editor

In pseudo-code, this is what we did:

uint8_t rax[8];
uint8_t rbx[8];

memcpy(rax, xmm3, sizeof(uint64_t));
memcpy(rbx, xmm2 + 1 * sizeof(uint64_t), sizeof(uint64_t));

for (int i = 0; i < 8; i++) {
    rax[i] ^= rbx[i];
}

Recall that xmm2 was set with movupd xmm2, xmmword [rdi+0x20] and xmm3 with movupd xmm3, xmmword [rdi+0x30]. In xmm2 we have input[32:47] (16 bytes), and in xmm3 we have input[48:63].

rax is therefore the XOR’d result of input[48:55] with input[40:47]. This is our new input constraint: having both of these sequences contain the same bytes would be enough to set rax to zero and trigger a SIGFPE in div rbx (edit: this is not what div rbx does).

</unnecessary-detour>

So all we need for this section is to run the binary on a machine where rdx will be set to a large value somewhere during the execution of the program. Whether this will happen likely depends on a number of factors. My attempts at running this in Docker from a variety of base images have all failed, but I was able to get a large value in rdx consistently by running the binary in a Google Cloud Shell.

From this point going forward I will assume that we’ve triggered the SIGFPE, meaning that we’re running the binary on a machine where rdx ends up having a large value when the div rbx instruction is reached.

Discovering the signal handler

While this error prevents us from continuing through to the error message, we still have to figure out where we end up. How was the signal even caught, did we miss something in main? I had first opened this executable with Binary Ninja, which identifies the code at 0x004010b0 as being the main function. But if we’d just had a quick glance a few lines above main, we would have seen a signal handler being installed for signal 8 (SIGFPE):

00401090  lea     rsi, [rel sub_4013e0] ; presumably a signal handler
00401097  mov     edi, 0x8              ; signal 8, SIGFPE
0040109c  jmp     signal                ; note the `jmp`, not `call`

It is interesting that these instructions are not reachable from main, but instead seem to be configured as a prelude to main itself. This sort of trickery can let developers run code before main without making it too obvious, and I believe this is the first time I’ve analyzed a crackme using this technique. With GCC, doing so can be achieved with a function attribute like constructor which causes the function to be called automatically before execution enters main().

The code jumps to a call instruction for the signal libc function (see man page here). We see once again edi and rsi used as first and second parameters; this matches the expected order in which arguments are passed in to sig_t signal(int sig, sig_t func). If we’re hoping to trigger a SIGFPE, the next logical step is to look at the handler that was registered and see where we land.

At first glance, sub_4013e0 already looks promising. We can spot references to data_4040b0 and data_4040a8 (the two variables we had written to before the division), and there are also a bunch of transformations on them using xor and sub. We have a straight jmp to print_failed_to_validate_and_exit, and a few conditional jumps including one to 0x40143d where some instructions refer to the string "You found the password:\n\t" at 0x00402010. We might be close!

Let’s take a step back to review what we had written to data_4040b0 and data_4040a8:

00401661  mov     rbx, 0xffff231203         ; puts this long constant into rbx
0040166b  pextrq  r8, xmm3, 0x1             ; extracts 64 bits again from qword offset 1 in xmm3 into r8
00401672  movq    r9, xmm2                  ; and the lower 64 bits of xmm2 into r9
00401677  mov     qword [data_4040b0], r9   ; saves r9 to a static variable
0040167f  mov     qword [data_4040a8], r8   ; and r8 to another

This is approximately equivalent to:

uint64_t rbx, r8, r9;
uint8_t xmm2[16], xmm3[16];  // 128-bit registers

rbx = 0xffff231203; // doesn't really matter since it's unused
memcpy(&r8, xmm3 + sizeof(uint64_t), sizeof(uint64_t)); // quadword copy at offset 1 qword (so 8 bytes)
memcpy(&r9, xmm2, sizeof(uint64_t)); // quadword copy from the lowest 64 bits

*((uint64_t*)0x004040b0) = r9;
*((uint64_t*)0x004040a8) = r8;

Now that we’re entering sub_4013e0, let’s see what’s between us and the success message. There are 6 conditional jumps, and if we managed to take the right ones and skip the others, it would lead us straight to the successful end of this program:

004013e0  mov     rdi, qword [rel data_4040b0]
004013e7  mov     r8, qword [rel data_4040a8]
004013ee  push    rbp       ; saves rbp
004013ef  cmp     rdi, r8   ; if the two variables match
004013f2  je      0x4014c0  ; jumps to error message

004013f8  cmp     rdi, 0xf  ; if rdi > 15
004013fc  ja      0x401490  ; calls `sub_401200`, changes eax, and continues at 00401413 (label A)

00401402  sub     rdi, 0x1  ; rdi--
00401406  xor     r9d, r9d  ; resets r9d
00401409  cmp     rdi, 0xe  ; if rdi <= 14
0040140d  jbe     0x4014b0  ; sets r9d to a value from an array, then jumps to label A

; (label A is here)
00401413  cmp     r8, 0xf   ; if r8 > 15
00401417  ja      0x401430  ; jumps to label D

00401419  sub     r8, 0x1   ; r8--
0040141d  xor     eax, eax  ; resets eax
0040141f  cmp     r8, 0xe   ; if r8 <= 14
00401423  jbe     0x4014a0  ; sets eax to a value from an array, then jumps to label B

; (label B is here)
00401425  cmp     eax, r9d
00401428  je      0x40143d  ; jump to label E: password found!

; (label C is here)
0040142a  pop     rbp       ; restores rbp
0040142b  jmp     print_failed_to_validate_and_exit
    ; (does not return)

; (label D is here)
00401430  mov     rdi, r8       ; copies value
00401433  call    sub_401200    ; calls this mystery function again
00401438  cmp     eax, r9d      ; if eax != r9d
0040143b  jne     0x40142a      ; then jump to label C

; (label E is here)
0040143d  mov     edx, 0x19             ; third argument: length=25
00401442  lea     qword rsi, [0x402010] ; second argument: "You found the password:\n\t"
00401449  mov     edi, 0x1              ; first argument: stdout
0040144e  call    write                 ; calls write(stdout, "You found…", 25)

00401453  mov     rbp, qword [0x4040a0] ; points to our input
0040145a  mov     rdi, rbp              ; first (and only) argument of strlen
0040145d  call    strlen                ; calls strlen on our input, result goes into eax

00401462  mov     rsi, rbp              ; second argument: our input
00401465  mov     edi, 0x1              ; first argument: stdout
0040146a  mov     rdx, rax              ; third argument: the result of strlen
0040146d  call    write                 ; calls write(stdout, input, strlen(input))

00401472  mov     edi, 0x1              ; first argument: stdout
00401477  mov     edx, 0x1              ; third argument: length=1
0040147c  lea     rsi, qword [0x40202a] ; points to a single '\n' character
00401483  call    write                 ; calls write(stdout, "\n", 1)

00401488  xor     edi, edi              ; resets edi
0040148a  call    exit                  ; calls exit(0)

Recall that the data at data_4040b0 was set to input[32:40] (via r9) and the data in data_4040a8 was set to input[56:63] (via r8).

The two short jumps at 0x004013fc and 0x0040140d refer to 0x00402040, and both are for conditions where a register is less than 15. The data at that address seems to contain integers: just by looking at a few bytes there, we can recognize it as an array of int. We can tell that it contains 15 values since exactly 60 bytes after it starts we have a header for an exception handler frame.

This is what’s at 0x00402040:

00402040     01 00 00 00 01 00 00 00 02 00 00 00 01 00 00 00 02 00 00 00
00402054     02 00 00 00 03 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00
00402068     03 00 00 00 02 00 00 00 03 00 00 00 03 00 00 00 04 00 00 00

Which decodes to:

int32_t data_402040[] = { 1, 1, 2, 1, 2,
                          2, 3, 1, 2, 2,
                          3, 2, 3, 3, 4 };

There are multiple calls to sub_401200, but let’s put off exploring this long function before we have a better understanding of where we are now.

Using Ghidra, we can decompile the current block at 0x004013e0 into a rough estimate of what the code looks like, and refine it into something more readable. After renaming a few variables and referencing the integer array, I ended up with this listing:

void sub_004013e0() {
    int validation_1;
    int validation_2;
    size_t len;
    uint32_t var_r8;
    uint32_t var_rdi;
    char *ptr;

    // reading back the two values we saved earlier
    var_r8 =  *(0x004040a8);
    var_rdi = *(0x004040b0);

    // the two can't be the same
    if (var_rdi == var_r8) {
        print_failed_to_validate_and_exit();
    }

    // first validation, with `rdi`
    if (var_rdi < 16) { // unlikely with character input
        validation_1 = 0;
        if (var_rdi - 1 < 15) {
            validation_1 = data_402040[var_rdi - 1];
        }
    } else { // so we'll probably call `sub_401200`
        validation_1 = sub_401200(var_rdi);
    }

    // second validation, with `r8` this time but pretty much the same
    if (var_r8 < 16) { // also difficult with ascii input
        validation_2 = 0;
        if (var_r8 - 1 < 15) {
            validation_2 = data_402040[var_r8 - 1];
        }
        if (validation_2 == validation_1) { // success
success:
            write(1,"You found the password:\n\t",0x19);
            ptr = input_backup;
            len = strlen(input_backup);
            write(1,ptr,len);
            write(1,&new_line,1);
            exit(0);
        }
    } else { // we're likely to end up here instead
        validation_2 = sub_401200(var_r8);  // calling this function again, with r8 this time
        if (validation_2 == validation_1) goto success; // and hopefully jump into the success block
    }
    print_failed_to_validate_and_exit();
}

With this listing, it seems at first like there might be a direct path to the success message, depending on what ends up in validation_1 and validation_2 using var_rdi and var_8 as offsets into the integer array data_402040.

As the very first step, we make sure that the values of rdi and r8 are different; if they are both strictly less than 16 and both point to the same integer in the array, then we’d have a match. This doesn’t really seem possible: we pulled these two values as “quad-words” (64-bit) from xmm2 and xmm3, which contain some of our input bytes. We’d need them to be mostly null, with maybe a single remaining byte having a value under 16.

Therefore, our only alternative seems to be to match the return values of the two calls to sub_401200, saved in validation_1 and validation_2. If this was a hash function, then using the same input would get us to the same output, but here we have this additional constraint that rdi and r8 cannot be equal.

Since it looks like we don’t have a choice, let’s explore sub_410200 and figure out how we can make it return the same value twice with different inputs.

A “hash” function

sub_410200 starts by taking its parameter rdi and copying it to rcx and rdx:

00401200  mov     rcx, rdi
00401203  mov     rdx, rdi

After these two instructions, 16 blocks follow with a very similar structure to each other. They all differ by a small amount given by a constant, so this looks very much like an unrolled loop. The constant is unused in the first block, then has value 4 in the second, 8 in the third… up to 0x3c or 60 in the last. We’re counting from 0 to 64 in steps of 4. Let’s look at the third block (i=2) to understand the overall structure:

0040123d  mov     rcx, rdx  ; restores our input (originally rdi) into rcx
00401240  shr     rcx, 0x8  ; shift right by 8 (which is 4 times 2 if 2 is a block counter from 0 to 16)
00401244  and     ecx, 0xf  ; in the bottom 32 bits, keep only the lowest 4 (the high-32 are unchanged)
00401247  sub     rcx, 0x1  ; remove 1
0040124b  cmp     rcx, 0xe  ; compare to 14
0040124f  ja      0x40125b  ; if strictly greater, skip over the next two instructions

00401251  lea     rsi, [rel data_402040]    ; if it was lower or equal, then use the `data_402040` array
00401258  add     eax, dword [rsi+rcx*4]    ; and add the value at (integer) offset rcx to eax.

Converted back to a loop, we have something like:

uint64_t sub_401200(uint64_t rdi) {
    uint64_t rdx = rdi; // back up the input
    uint64_t eax = 0; // reset the accumulator

    for (int i = 0; i < 16; i++) {
        uint64_t rcx = rdx;
        rcx >>= (i * 4);    // shift by 4 more bits in each loop
        rcx &= 0xf;         // mask: keep only the 4 least significant bits
        rcx--;              // decrement

        // use rcx as an index into the array
        if (rcx <= 14) {
            eax += data_402400[rcx]; // add this number to eax
        }
    }
    return eax;
}

We can think of this loop as doing 16 passes over the 16 “hex digits” of rdi, where for each one it’ll decrement the “digit” by one and use this value to read the number at that offset in the array, summing them all.

If our input was 0x1122334455667788, then we’d compute data_402400[0] + data_402400[0] + data_402400[1] + … + data_402400[7].

This is our last condition: running this algorithm on var_r8 and var_rdi must produce the same value, which seems feasible given that the array data_402400 has many duplicate values. In fact we don’t even have to compute them by hand: if their values are an anagram of each other in hex, then we’ll compute the same offsets into the array and add those values to the running total in eax. Even just swapping two characters would be enough.

Putting it all together

Let’s review the constraints we found to see if we can come up with an input string that satisfies them all.

Given secret = "dG46rskj8-457(~:",

  1. The first 16 characters first XOR’d with secret and then together must be zero
  2. The sum of the next 16 bytes must equal (0x88 × (input[11] ^ secret[11]) + 0x13a)
  3. The sequence from 48 to 55 must match 40 to 47 for us to trigger the SIGFPE
    (as we saw this isn’t actually required)
  4. sub_401200(input[32:47]) must be equal to sub_401200(input[48:63]), but input[32:47] must not be equal to input[48:63]

An easy way to satisfy the first constraint is to use secret as the first 16 bytes, but this would give us input[11] ^ secret[11] = 0, so the sum of the next 16 bytes would have to equal 0x13a, which is only 314 – an average of 19 per byte and far from printable characters.

Let’s first compute all the possible values for this sum:

values = set()
secret = "dG46rskj8-457(~:"
for a in map(ord, secret):
    for b in map(ord, secret):
        total = 0x88 * (a ^ b) + 0x13a
        values.add((chr(a), chr(b), total))

Then see if one of the totals is easily reachable using 16 characters with 2 different values:

>>> list(filter(lambda x: x[1] == '5' and chr(x[2]//16).isascii(), values))
[('7', '5', 586), ('6', '5', 722), ('4', '5', 450), ('5', '5', 314)]

722 is 16 × 45 + 2, with 45 being the ascii code for '-' and 46 the one for '.'; 16 dashes would sum up to 720, but replacing two of them with dots brings us to 722:

>>> sum(map(ord, ('-' * 14 + '.' * 2)))
722
>>> sum(map(ord, ('-' * 15 + '/')))     # there are
722
>>> sum(map(ord, (',' * 14 + '55')))    # countless
722
>>> sum(map(ord, ('+' * 12 + '1346')))  # other options…
722

Our target is 722 if input[11] and secret[11] are '6' and '5', respectively.

We could try having input being almost the same as secret, except with characters at offsets 3 and 11 swapped:

index0123456789101112131415
inputdG45rskj8-467(~:
secretdG46rskj8-457(~:

The first constraint is satisfied:

>>> input0_16 = "dG45rskj8-467(~:"
>>> secret    = "dG46rskj8-457(~:"
>>> from functools import reduce
>>> reduce(lambda a,b: a^b, [ord(secret[i]) ^ ord(input0_16[i]) for i in range(16)])
0

and so is the second:

>>> input16_32 = '-' * 14 + '.' * 2
>>> sum(map(ord, input16_32))
722
>>> sum(map(ord, input16_32)) == 0x88 * (ord(input0_16[11]) ^ ord(secret[11])) + 0x13a
True

Here’s what it looks like for the remaining constraints:

hex editor

For the third one, we can pick anything we want as long as the right ranges match. In xmm2:

index32333435363738394041424344454647
input????????ABCDEFGH

and xmm3:

index48495051525354555657585960616263
inputABCDEFGH????????

(the question marks ? are placeholders and could be any character, for now).

To this we’ll need to add the last constraint, which is to match the “hash” produced by sub_401200. Since this hash is computed by using each “hex digit” as an index into an array, we can have two different letters swapped and the sum will end up being the same.

If we use the characters 1 and 2 (ASCII 0x31 & 0x32) with 12 in the first string and 21 in the other, the “hex letters” will be [3,1,3,2] for 12 and [3,2,3,1] for 21. After decrementing at 0x00401247, the hash will be computed from the offsets [2,0,0,1] in one case and [0,1,2,0] in the other. The sum will be the same.

We can keep our question marks but change [32:33] to 12 and [56:57] to 21 so that input[32:39] is not equal to input[56:63]:

index32333435363738394041424344454647
input12??????ABCDEFGH

and

index48495051525354555657585960616263
inputABCDEFGH21??????

All together:

$ ./cm001
Input:
dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????
You found the password:
	dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????

Of course, changing the 12/21 to any other pair of swapped characters works just fine, and there are many more changes we could make within the range of these constraints. They are flexible enough to allow a whole range of valid inputs:

$ ./cm001
Input:
dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://
You found the password:
	dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://

Closing thoughts

I really enjoyed reversing this binary! It had a good combination of challenges, and the signal handler trick with SIGFPE was something I had never seen in a crackme. This is the first post where I attached handwritten notes; I hope these are as useful to readers as they are to me.

June 6, 2022 update: The crackme author JCWasmx86 got in touch, and provided a link to a recently published GitHub repo containing the source code for it. Note that the binary needs to be built with -O3 for it to behave as expected.

While it’s great to have the source code, I would recommend only looking at it once you understand the disassembled binary.

June 23, 2022 update: While discussing this crackme with the author, I realized that the “valid” solutions I had found were not accepted when I ran the program in a container. This led me to find the mistake I had made about div rbx not actually dividing by rax, and to discover that whether a solution is accepted is partially dependent on assumptions from the author regarding the value of the rdx register when the div instruction is executed. Interestingly, the author stated that in their experience the -O3 build flag had an impact on the value taken by the rdx register before the div instruction.

For the sake of transparency, a past copy of this article with the error is still readable thanks to the Internet Archive.