Crackme 21: JCWasmx86’s cm001
After a break of a few months where I pursued other projects, I was missing a good RE challenge so looked for new submissions on crackmes.one. I found JCWasmx86’s cm001 crackme which had good ratings, so decided to give it a try. As usual, the binary is mirrored here.
The description says it’s written in C and assembly, and that it runs on Linux. The goal is to “find one of the valid passwords. (There are many possible ones)”.
Let’s start by running the binary:
$ ./cm001
Input:
hello
Failed to validate password
$ echo $?
1
strings reveals a promising message:
$ strings ./cm001 | grep password
You found the password:
Failed to validate password
Reading the input in main ¶
Jumping into disassembly, we can easily find main since this binary seems to have most if not all of its symbols intact. In main, a 65-byte buffer is filled with null bytes and used to receive user input using fgets:
004010b1 pxor xmm0, xmm0 ; erases the 128-bit xmm0 register
004010b5 lea rdi, [rel data_40202c] ; loads a string
004010bc sub rsp, 0x50 ; reserves some stack space
004010c0 movaps xmmword [rsp {var_58}], xmm0 ; moves these zero'd 128 bits to var_58
004010c4 mov rbp, rsp {var_58}
004010c7 movaps xmmword [rsp+0x10 {var_48}], xmm0 ; erases the next 16 bytes
004010cc movaps xmmword [rsp+0x20 {var_38}], xmm0 ; and 16 more
004010d1 movaps xmmword [rsp+0x30 {var_28}], xmm0 ; and 16 more
004010d6 mov byte [rsp+0x40 {var_18}], 0x0 ; as well as a terminating null byte
004010db call puts ; prints "Input:"
004010e0 mov rdx, qword [rel stdin] ; arg to fgets: read from stdin
004010e7 mov esi, 0x41 ; fgets will read up to 65-1 bytes
004010ec mov rdi, rbp {var_58} ; into var_58
004010ef call fgets
004010f4 mov rdi, rbp {var_58} ; then use var_58
004010f7 call strlen ; as an argument to strlen
004010fc mov rdi, rbp {var_58}
004010ff mov esi, eax ; eax contains the result from strlen
00401101 call sub_40152b ; jump somewhere else
Note the use of the SSE register xmm0, which is 16 bytes or 128 bits wide.
The sequence above corresponds approximately to:
char var_58[65] = {0};
puts("Input:");
fgets(var_58, sizeof(var_58), stdin);
size_t eax = strlen(var_58);
sub_40152b(var_58, eax);
A first input constraint on xmm0 ¶
Let’s continue in sub_40152b. I assumed above that it took 2 parameters, since the user input is placed in rdi and the length in esi. These are the first 2 registers used to pass parameters in the System V AMD64 ABI calling convention.
This function starts with an immediate comparison of the input length and 0x40, jumping over a large block of code if the two don’t match. We can probably infer that our password needs to be 64 characters long:
0040152b cmp esi, 0x40 ; compare length and 0x40
0040152e mov qword [data_4040a0], rdi ; also store a pointer to our input at this address
00401536 jne sub_40168a ; no match: jump way down
Right away, we start unpacking the input into SSE registers, from xmm0 to xmm3. That’s 4 registers of 16 bytes each, covering the full 64 bytes of expected input:
0040153c movupd xmm0, xmmword [rdi] ; first 16 bytes
00401540 movupd xmm1, xmmword [rdi+0x10] ; next 16…
00401545 movupd xmm2, xmmword [rdi+0x20] ; etc.
0040154a movupd xmm3, xmmword [rdi+0x30]
A long sequence of XOR operations follows, one over 16 bytes and the rest over single bytes. Its starts with xmm0 being XOR’d with the 128-bit xmmword at data_404080, which points to the string "dG46rskj8-457(~:". After this initial wide XOR, the program extracts single bytes from the xmm0 register with pextrb (extracts a single byte at the given offset), and using eax as an XOR “accumulator” combines the extracted bytes together:
0040154f pxor xmm0, xmmword [data_404080] ; XORs this "secret" string with our first 16 bytes
00401558 xor eax, eax ; resets eax
0040155a pextrb r8d, xmm0, 0x0 ; r8d <- input[0] ^ secret[0]
00401561 xor eax, r8d ; eax ^= r8d
00401564 pextrb r8d, xmm0, 0x1 ; r8d <- input[1] ^ secret[1]
0040156b xor eax, r8d ; eax ^= r8d
0040156e pextrb r8d, xmm0, 0x2 ; ... the same
00401575 xor eax, r8d
00401578 pextrb r8d, xmm0, 0x3
0040157f xor eax, r8d
00401582 pextrb r8d, xmm0, 0x4
00401589 xor eax, r8d
0040158c pextrb r8d, xmm0, 0x5
00401593 xor eax, r8d
00401596 pextrb r8d, xmm0, 0x6
0040159d xor eax, r8d
004015a0 pextrb r8d, xmm0, 0x7
004015a7 xor eax, r8d
004015aa pextrb r8d, xmm0, 0x8
004015b1 xor eax, r8d
004015b4 pextrb r8d, xmm0, 0x9
004015bb xor eax, r8d
004015be pextrb r8d, xmm0, 0xa
004015c5 xor eax, r8d
004015c8 pextrb r8d, xmm0, 0xb
004015cf xor eax, r8d
004015d2 pextrb r8d, xmm0, 0xc
004015d9 xor eax, r8d
004015dc pextrb r8d, xmm0, 0xd
004015e3 xor eax, r8d
004015e6 pextrb r8d, xmm0, 0xe
004015ed xor eax, r8d
004015f0 pextrb r8d, xmm0, 0xf ; all the way to the 16th byte
004015f7 xor eax, r8d ; all XOR'd together
004015fa cmp eax, 0x0 ; and if all of that is not zero
004015fd jne sub_40168a ; then jump somewhere.
To think about these bytes, we have to consider the endianness of the plaform: with input[0:16] XOR dG46rs… stored in xmm0, using pextrb to extract a byte at a specific offset will select that byte starting from the least significant one. Here, the instruction at 0x0040155a that pulls byte zero with pextrb r8d, xmm0, 0x0 is setting r8d to input[0] XOR secret[0].
A quick look at sub_40168a tells us what we suspected: it prints an error message and exits:
0040168a mov rax, 0x1 ; system call 1: sys_write
00401691 mov rdi, 0x1 ; fd=1, that's stdout
00401698 mov rsi, data_404060 {"Failed to validate password\n"}
0040169f mov rdx, 0x1c ; 0x1c is the message length, 28 bytes
004016a6 syscall ; prints it
004016a8 mov rax, 0x3c ; syscall 0x3c is sys_exit
004016af mov rdi, 0x1 ; with status code 1.
004016b6 syscall ; bye
From this point, I’ve renamed sub_40168a to print_failed_to_validate_and_exit to make it more visible.
Now that we have our first constraint to apply to the first 16 bytes of input, let’s continue back after the jne that started our detour.
A second constraint, now on xmm1 ¶
00401603 mov rax, 0x11 ; rax is set to 0x11
0040160a add rax, rax ; doubled
0040160d xor r8, r8 ; r8 is reset to zero
00401610 pextrb r8d, xmm0, 0xb ; r8d <- input[11] ^ secret[11]
00401617 imul rax, r8 ; rax <- rax * r8
0040161b add rax, rax ; rax <- rax * 2
0040161e add rax, rax ; rax <- rax * 2
00401621 add rax, 0x13a ; rax <- rax + 0x13a
00401627 movupd xmmword [data_4040b0], xmm1 ; copies xmm1 into this static variable
00401630 xor rcx, rcx ; resets rcx (used below as a counter)
00401633 mov r8, 0x4040b0 ; r8 <- 0x4040b0 (the address of the variable we just wrote xmm1 to)
0040163a xor rbx, rbx ; resets rbx
0040163d mov bl, byte [r8+rcx] ; takes one byte from xmm1[counter] <――╮
00401641 sub rax, rbx ; removes it from rax │
00401644 inc rcx ; increments the counter │
00401647 cmp ecx, 0x10 ; if not equal to 16 │
0040164a jne 0x40163d ; loop back up ―――――――――――――――――――――――╯
0040164c cmp rax, 0x0
00401650 jne print_failed_to_validate_and_exit
Side-note: r8d and r8b are to r8 what eax and al are to rax.
To recap in pseudo-code:
// we start by computing a value based on input[11] ^ secret[11]
rax = 0x11 * 2;
r8d = input[11] ^ secret[11];
rax *= r8; // multiplied
rax += rax; // doubled
rax += rax; // doubled again
rax += 0x13a;
// equivalent to: rax = (0x88 * (input[11] ^ secret[11]) + 0x13a)
ptr = xmm1 (== &input[16]);
counter = 0;
do { // for each of the next 16 bytes (xmm1 goes from 16 to 31)
r8 = ptr;
rbx = 0;
bl = ptr[counter]; // we take byte[i]
rax -= bl; // and remove it from the computed value rax
counter++;
} while (counter != 16);
if (rax != 0) { // they must match
print_failed_to_validate_and_exit();
}
This gives us another constraint on the input.
Escaping the error message ¶
Looking at the instructions below this point, we start to notice an impasse: the jumps to print_failed_to_validate_and_exit were just jumps, not function calls. And we’re fast approaching this location, in fact there are no conditional jumps between the last jne print_failed_to_validate_and_exit we saw at 0x00401650 and the error message print itself. It almost feels like we went down the wrong path, since we clearly have no way out.
Here’s where we are, in context:

With seemingly no way to avoid the error message, let’s take a closer look at the few instructions ahead of us. Starting at 0x00401652:
00401652 movq rax, xmm3 ; movq takes the lower 64 bits from xmm3
00401657 pextrq rbx, xmm2, 0x1 ; extracts 64 bits from byte 1 of xmm2, into rbx
0040165e xor rax, rbx ; xors the two together, stores the result in rax
00401661 mov rbx, 0xffff231203 ; puts this long constant into rbx (no clear reason for this)
0040166b pextrq r8, xmm3, 0x1 ; extracts 64 bits again from offset 1 in xmm3 into r8
00401672 movq r9, xmm2 ; and the lower 64 bits of xmm2 into r9
00401677 mov qword [data_4040b0], r9 ; saves r9 to a static variable
0040167f mov qword [data_4040a8], r8 ; and r8 to another
00401687 div rbx ; *divides* (rdx:rax) by rbx(!)
; falls through into print_failed_to_validate_and_exit
There’s clearly no jump here, we just continue on to the error message. The only way I can think of that would cause us to bail from these few instructions would be for one of them to trigger a fault of some kind. One approach would be to access memory that’s unreachable, but this code doesn’t compute any pointers to write to or read from.
Our only remaining option is with the last instruction div rbx, which divides rdx:rax by rbx; we could try to trigger a SIGFPE (arithmetic exception) by manipulating rdx and/or rax so that it ends up overflowing. Despite its name meaning “signal floating-point error”, SIGFPE can be thrown here during this integer division, in one of two cases (Intel reference manual, Vol. 2A 3-317 “Unsigned Divide”):
- If the denominator is zero, which we know is impossible here since we have a
mov rbx, 0xffff231203just above. - If the result of the division doesn’t fit in 64 bits. Keep in mind that it’s not just
raxthat’s being divided byrbx, butrdx:rax. This syntax describes the concatenation over 128 bits of registersrdxandraxto create the numerator.
For example, if we haverdx = 0x1234,rax = 0x5678, andrbx = 0xabcdef, what we’re computing is((rdx << 64) | rax) / rbx, or0x12340000000000005678 / 0xabcdefwhich produces the quotient0x1b1fb592b526e1inraxand the remainder0x73dd69inrdx. There’s no error since the quotient fits in 64 bits.
But if we have a larger value inrdx:raxand the quotient is too large to write torax, for example if we kept the same values except forrdx = 0xffffffff, the quotient would be0x17d74fd494920f78247which clearly doesn’t fit in 64 bits. This would trigger aSIGFPEsignal.
Let’s see what this means about our input. Even if we somehow managed to make rax as large as possible (all bits set), the division by the constant in rbx would still succeed if rdx was left to zero. We can try to find some instructions that manipulate bits of rdx based on our input, and set enough bits that rdx:rax becomes large enough before the division. But going back to the caller, there really aren’t that many places that touch rdx; the only one I could find was at 0x04010e0 with a mov rdx, qword [stdin] but this is just setting up a parameter for fgets.
A quick side note about pextrq: here the offset is a qword offset and not a byte offset as in the sequence of pextrb instructions above. The documentation for PEXTRQ r/m64, xmm2, imm8 says: “Extract a qword integer value from xmm2 at the source qword offset specified by imm8 into r64”. With a source qword offset of 1, we copy the high (most significant) 64 bits of xmm2; if it was zero it would be the low (least significant) 64 bits.
A register left to chance ¶
It doesn’t seem to me that there is any way to control the value of rdx from the input. While investigating this issue I noticed a difference between the behavior of the program when I ran it within a container vs directly in a virtual machine. Both systems were x86-64 running Debian 11 “Bullseye”, with the same password input. I set a breakpoint on the div instruction at 0x401687 with b *0x401687 in GDB, and looked up the registry values at that time. In the container I had a small value for rdx:
$rax : 0x0
$rbx : 0xffff231203
$rcx : 0x10
$rdx : 0x30
while in the VM it was a much larger number:
$rax : 0x0
$rbx : 0xffff231203
$rcx : 0x10
$rdx : 0x7fffffffdb70
With such a large value in rdx, the result of the division by rbx does not fit in 64 bits:
>>> rdx = 0x7fffffffdb70
>>> rax = 0
>>> rbx = 0xffff231203
>>> hex(((rdx << 64) | rax) // rbx)
'0x80006e77394526c345'
From what I can tell, it looks like whether the input is accepted as valid depends on an assumption regarding the value of rdx once we get to the div rbx instruction. Without a way to influence rdx from our inputs, this means that the validity of the password is likely platform-dependent.
In my case, I had started looking at this div with assumptions based on a misunderstanding: I mistakenly thought that the div rbx instruction would divide by rax instead of dividing (rdx:rax). This led me to try to set rax to zero, in order to divide by zero and trigger the signal. This is actually unnecessary, but for the record this is how I went about it:
<unnecessary-detour>
Here’s what a 128-bit SSE register would contain for the given input "hello, RE world!", as well as which parts are read by pextrb and pextrq commands:

In pseudo-code, this is what we did:
uint8_t rax[8];
uint8_t rbx[8];
memcpy(rax, xmm3, sizeof(uint64_t));
memcpy(rbx, xmm2 + 1 * sizeof(uint64_t), sizeof(uint64_t));
for (int i = 0; i < 8; i++) {
rax[i] ^= rbx[i];
}
Recall that xmm2 was set with movupd xmm2, xmmword [rdi+0x20] and xmm3 with movupd xmm3, xmmword [rdi+0x30]. In xmm2 we have input[32:47] (16 bytes), and in xmm3 we have input[48:63].
rax is therefore the XOR’d result of input[48:55] with input[40:47]. This is our new input constraint: having both of these sequences contain the same bytes would be enough to set rax to zero and trigger a (edit: this is not what SIGFPE in div rbxdiv rbx does).
</unnecessary-detour>
So all we need for this section is to run the binary on a machine where rdx will be set to a large value somewhere during the execution of the program. Whether this will happen likely depends on a number of factors. My attempts at running this in Docker from a variety of base images have all failed, but I was able to get a large value in rdx consistently by running the binary in a Google Cloud Shell.
From this point going forward I will assume that we’ve triggered the SIGFPE, meaning that we’re running the binary on a machine where rdx ends up having a large value when the div rbx instruction is reached.
Discovering the signal handler ¶
While this error prevents us from continuing through to the error message, we still have to figure out where we end up. How was the signal even caught, did we miss something in main? I had first opened this executable with Binary Ninja, which identifies the code at 0x004010b0 as being the main function. But if we’d just had a quick glance a few lines above main, we would have seen a signal handler being installed for signal 8 (SIGFPE):
00401090 lea rsi, [rel sub_4013e0] ; presumably a signal handler
00401097 mov edi, 0x8 ; signal 8, SIGFPE
0040109c jmp signal ; note the `jmp`, not `call`
It is interesting that these instructions are not reachable from main, but instead seem to be configured as a prelude to main itself. This sort of trickery can let developers run code before main without making it too obvious, and I believe this is the first time I’ve analyzed a crackme using this technique. With GCC, doing so can be achieved with a function attribute like constructor which causes the function to be called automatically before execution enters main().
The code jumps to a call instruction for the signal libc function (see man page here). We see once again edi and rsi used as first and second parameters; this matches the expected order in which arguments are passed in to sig_t signal(int sig, sig_t func).
If we’re hoping to trigger a SIGFPE, the next logical step is to look at the handler that was registered and see where we land.
At first glance, sub_4013e0 already looks promising. We can spot references to data_4040b0 and data_4040a8 (the two variables we had written to before the division), and there are also a bunch of transformations on them using xor and sub. We have a straight jmp to print_failed_to_validate_and_exit, and a few conditional jumps including one to 0x40143d where some instructions refer to the string "You found the password:\n\t" at 0x00402010. We might be close!
Let’s take a step back to review what we had written to data_4040b0 and data_4040a8:
00401661 mov rbx, 0xffff231203 ; puts this long constant into rbx
0040166b pextrq r8, xmm3, 0x1 ; extracts 64 bits again from qword offset 1 in xmm3 into r8
00401672 movq r9, xmm2 ; and the lower 64 bits of xmm2 into r9
00401677 mov qword [data_4040b0], r9 ; saves r9 to a static variable
0040167f mov qword [data_4040a8], r8 ; and r8 to another
This is approximately equivalent to:
uint64_t rbx, r8, r9;
uint8_t xmm2[16], xmm3[16]; // 128-bit registers
rbx = 0xffff231203; // doesn't really matter since it's unused
memcpy(&r8, xmm3 + sizeof(uint64_t), sizeof(uint64_t)); // quadword copy at offset 1 qword (so 8 bytes)
memcpy(&r9, xmm2, sizeof(uint64_t)); // quadword copy from the lowest 64 bits
*((uint64_t*)0x004040b0) = r9;
*((uint64_t*)0x004040a8) = r8;
Now that we’re entering sub_4013e0, let’s see what’s between us and the success message. There are 6 conditional jumps, and if we managed to take the right ones and skip the others, it would lead us straight to the successful end of this program:
004013e0 mov rdi, qword [rel data_4040b0]
004013e7 mov r8, qword [rel data_4040a8]
004013ee push rbp ; saves rbp
004013ef cmp rdi, r8 ; if the two variables match
004013f2 je 0x4014c0 ; jumps to error message
004013f8 cmp rdi, 0xf ; if rdi > 15
004013fc ja 0x401490 ; calls `sub_401200`, changes eax, and continues at 00401413 (label A)
00401402 sub rdi, 0x1 ; rdi--
00401406 xor r9d, r9d ; resets r9d
00401409 cmp rdi, 0xe ; if rdi <= 14
0040140d jbe 0x4014b0 ; sets r9d to a value from an array, then jumps to label A
; (label A is here)
00401413 cmp r8, 0xf ; if r8 > 15
00401417 ja 0x401430 ; jumps to label D
00401419 sub r8, 0x1 ; r8--
0040141d xor eax, eax ; resets eax
0040141f cmp r8, 0xe ; if r8 <= 14
00401423 jbe 0x4014a0 ; sets eax to a value from an array, then jumps to label B
; (label B is here)
00401425 cmp eax, r9d
00401428 je 0x40143d ; jump to label E: password found!
; (label C is here)
0040142a pop rbp ; restores rbp
0040142b jmp print_failed_to_validate_and_exit
; (does not return)
; (label D is here)
00401430 mov rdi, r8 ; copies value
00401433 call sub_401200 ; calls this mystery function again
00401438 cmp eax, r9d ; if eax != r9d
0040143b jne 0x40142a ; then jump to label C
; (label E is here)
0040143d mov edx, 0x19 ; third argument: length=25
00401442 lea qword rsi, [0x402010] ; second argument: "You found the password:\n\t"
00401449 mov edi, 0x1 ; first argument: stdout
0040144e call write ; calls write(stdout, "You found…", 25)
00401453 mov rbp, qword [0x4040a0] ; points to our input
0040145a mov rdi, rbp ; first (and only) argument of strlen
0040145d call strlen ; calls strlen on our input, result goes into eax
00401462 mov rsi, rbp ; second argument: our input
00401465 mov edi, 0x1 ; first argument: stdout
0040146a mov rdx, rax ; third argument: the result of strlen
0040146d call write ; calls write(stdout, input, strlen(input))
00401472 mov edi, 0x1 ; first argument: stdout
00401477 mov edx, 0x1 ; third argument: length=1
0040147c lea rsi, qword [0x40202a] ; points to a single '\n' character
00401483 call write ; calls write(stdout, "\n", 1)
00401488 xor edi, edi ; resets edi
0040148a call exit ; calls exit(0)
Recall that the data at data_4040b0 was set to input[32:40] (via r9) and the data in data_4040a8 was set to input[56:63] (via r8).
The two short jumps at 0x004013fc and 0x0040140d refer to 0x00402040, and both are for conditions where a register is less than 15. The data at that address seems to contain integers: just by looking at a few bytes there, we can recognize it as an array of int. We can tell that it contains 15 values since exactly 60 bytes after it starts we have a header for an exception handler frame.
This is what’s at 0x00402040:
00402040 01 00 00 00 01 00 00 00 02 00 00 00 01 00 00 00 02 00 00 00
00402054 02 00 00 00 03 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00
00402068 03 00 00 00 02 00 00 00 03 00 00 00 03 00 00 00 04 00 00 00
Which decodes to:
int32_t data_402040[] = { 1, 1, 2, 1, 2,
2, 3, 1, 2, 2,
3, 2, 3, 3, 4 };
There are multiple calls to sub_401200, but let’s put off exploring this long function before we have a better understanding of where we are now.
Using Ghidra, we can decompile the current block at 0x004013e0 into a rough estimate of what the code looks like, and refine it into something more readable. After renaming a few variables and referencing the integer array, I ended up with this listing:
void sub_004013e0() {
int validation_1;
int validation_2;
size_t len;
uint32_t var_r8;
uint32_t var_rdi;
char *ptr;
// reading back the two values we saved earlier
var_r8 = *(0x004040a8);
var_rdi = *(0x004040b0);
// the two can't be the same
if (var_rdi == var_r8) {
print_failed_to_validate_and_exit();
}
// first validation, with `rdi`
if (var_rdi < 16) { // unlikely with character input
validation_1 = 0;
if (var_rdi - 1 < 15) {
validation_1 = data_402040[var_rdi - 1];
}
} else { // so we'll probably call `sub_401200`
validation_1 = sub_401200(var_rdi);
}
// second validation, with `r8` this time but pretty much the same
if (var_r8 < 16) { // also difficult with ascii input
validation_2 = 0;
if (var_r8 - 1 < 15) {
validation_2 = data_402040[var_r8 - 1];
}
if (validation_2 == validation_1) { // success
success:
write(1,"You found the password:\n\t",0x19);
ptr = input_backup;
len = strlen(input_backup);
write(1,ptr,len);
write(1,&new_line,1);
exit(0);
}
} else { // we're likely to end up here instead
validation_2 = sub_401200(var_r8); // calling this function again, with r8 this time
if (validation_2 == validation_1) goto success; // and hopefully jump into the success block
}
print_failed_to_validate_and_exit();
}
With this listing, it seems at first like there might be a direct path to the success message, depending on what ends up in validation_1 and validation_2 using var_rdi and var_8 as offsets into the integer array data_402040.
As the very first step, we make sure that the values of rdi and r8 are different; if they are both strictly less than 16 and both point to the same integer in the array, then we’d have a match. This doesn’t really seem possible: we pulled these two values as “quad-words” (64-bit) from xmm2 and xmm3, which contain some of our input bytes. We’d need them to be mostly null, with maybe a single remaining byte having a value under 16.
Therefore, our only alternative seems to be to match the return values of the two calls to sub_401200, saved in validation_1 and validation_2. If this was a hash function, then using the same input would get us to the same output, but here we have this additional constraint that rdi and r8 cannot be equal.
Since it looks like we don’t have a choice, let’s explore sub_410200 and figure out how we can make it return the same value twice with different inputs.
A “hash” function ¶
sub_410200 starts by taking its parameter rdi and copying it to rcx and rdx:
00401200 mov rcx, rdi
00401203 mov rdx, rdi
After these two instructions, 16 blocks follow with a very similar structure to each other. They all differ by a small amount given by a constant, so this looks very much like an unrolled loop. The constant is unused in the first block, then has value 4 in the second, 8 in the third… up to 0x3c or 60 in the last. We’re counting from 0 to 64 in steps of 4. Let’s look at the third block (i=2) to understand the overall structure:
0040123d mov rcx, rdx ; restores our input (originally rdi) into rcx
00401240 shr rcx, 0x8 ; shift right by 8 (which is 4 times 2 if 2 is a block counter from 0 to 16)
00401244 and ecx, 0xf ; in the bottom 32 bits, keep only the lowest 4 (the high-32 are unchanged)
00401247 sub rcx, 0x1 ; remove 1
0040124b cmp rcx, 0xe ; compare to 14
0040124f ja 0x40125b ; if strictly greater, skip over the next two instructions
00401251 lea rsi, [rel data_402040] ; if it was lower or equal, then use the `data_402040` array
00401258 add eax, dword [rsi+rcx*4] ; and add the value at (integer) offset rcx to eax.
Converted back to a loop, we have something like:
uint64_t sub_401200(uint64_t rdi) {
uint64_t rdx = rdi; // back up the input
uint64_t eax = 0; // reset the accumulator
for (int i = 0; i < 16; i++) {
uint64_t rcx = rdx;
rcx >>= (i * 4); // shift by 4 more bits in each loop
rcx &= 0xf; // mask: keep only the 4 least significant bits
rcx--; // decrement
// use rcx as an index into the array
if (rcx <= 14) {
eax += data_402400[rcx]; // add this number to eax
}
}
return eax;
}
We can think of this loop as doing 16 passes over the 16 “hex digits” of rdi, where for each one it’ll decrement the “digit” by one and use this value to read the number at that offset in the array, summing them all.
If our input was 0x1122334455667788, then we’d compute data_402400[0] + data_402400[0] + data_402400[1] + … + data_402400[7].
This is our last condition: running this algorithm on var_r8 and var_rdi must produce the same value, which seems feasible given that the array data_402400 has many duplicate values. In fact we don’t even have to compute them by hand: if their values are an anagram of each other in hex, then we’ll compute the same offsets into the array and add those values to the running total in eax. Even just swapping two characters would be enough.
Putting it all together ¶
Let’s review the constraints we found to see if we can come up with an input string that satisfies them all.
Given secret = "dG46rskj8-457(~:",
- The first 16 characters first XOR’d with
secretand then together must be zero - The sum of the next 16 bytes must equal
(0x88 × (input[11] ^ secret[11]) + 0x13a) The sequence from 48 to 55 must match 40 to 47 for us to trigger the SIGFPE
(as we saw this isn’t actually required)sub_401200(input[32:47])must be equal tosub_401200(input[48:63]), butinput[32:47]must not be equal toinput[48:63]
An easy way to satisfy the first constraint is to use secret as the first 16 bytes, but this would give us input[11] ^ secret[11] = 0, so the sum of the next 16 bytes would have to equal 0x13a, which is only 314 – an average of 19 per byte and far from printable characters.
Let’s first compute all the possible values for this sum:
values = set()
secret = "dG46rskj8-457(~:"
for a in map(ord, secret):
for b in map(ord, secret):
total = 0x88 * (a ^ b) + 0x13a
values.add((chr(a), chr(b), total))
Then see if one of the totals is easily reachable using 16 characters with 2 different values:
>>> list(filter(lambda x: x[1] == '5' and chr(x[2]//16).isascii(), values))
[('7', '5', 586), ('6', '5', 722), ('4', '5', 450), ('5', '5', 314)]
722 is 16 × 45 + 2, with 45 being the ascii code for '-' and 46 the one for '.'; 16 dashes would sum up to 720, but replacing two of them with dots brings us to 722:
>>> sum(map(ord, ('-' * 14 + '.' * 2)))
722
>>> sum(map(ord, ('-' * 15 + '/'))) # there are
722
>>> sum(map(ord, (',' * 14 + '55'))) # countless
722
>>> sum(map(ord, ('+' * 12 + '1346'))) # other options…
722
Our target is 722 if input[11] and secret[11] are '6' and '5', respectively.
We could try having input being almost the same as secret, except with characters at offsets 3 and 11 swapped:
| index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | d | G | 4 | 5 | r | s | k | j | 8 | - | 4 | 6 | 7 | ( | ~ | : |
secret | d | G | 4 | 6 | r | s | k | j | 8 | - | 4 | 5 | 7 | ( | ~ | : |
The first constraint is satisfied:
>>> input0_16 = "dG45rskj8-467(~:"
>>> secret = "dG46rskj8-457(~:"
>>> from functools import reduce
>>> reduce(lambda a,b: a^b, [ord(secret[i]) ^ ord(input0_16[i]) for i in range(16)])
0
and so is the second:
>>> input16_32 = '-' * 14 + '.' * 2
>>> sum(map(ord, input16_32))
722
>>> sum(map(ord, input16_32)) == 0x88 * (ord(input0_16[11]) ^ ord(secret[11])) + 0x13a
True
Here’s what it looks like for the remaining constraints:

For the third one, we can pick anything we want as long as the right ranges match. In xmm2:
| index | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | ? | ? | ? | ? | ? | ? | ? | ? | A | B | C | D | E | F | G | H |
and xmm3:
| index | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | A | B | C | D | E | F | G | H | ? | ? | ? | ? | ? | ? | ? | ? |
(the question marks ? are placeholders and could be any character, for now).
To this we’ll need to add the last constraint, which is to match the “hash” produced by sub_401200. Since this hash is computed by using each “hex digit” as an index into an array, we can have two different letters swapped and the sum will end up being the same.
If we use the characters 1 and 2 (ASCII 0x31 & 0x32) with 12 in the first string and 21 in the other, the “hex letters” will be [3,1,3,2] for 12 and [3,2,3,1] for 21. After decrementing at 0x00401247, the hash will be computed from the offsets [2,0,0,1] in one case and [0,1,2,0] in the other. The sum will be the same.
We can keep our question marks but change [32:33] to 12 and [56:57] to 21 so that input[32:39] is not equal to input[56:63]:
| index | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | 1 | 2 | ? | ? | ? | ? | ? | ? | A | B | C | D | E | F | G | H |
and
| index | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | A | B | C | D | E | F | G | H | 2 | 1 | ? | ? | ? | ? | ? | ? |
All together:
$ ./cm001
Input:
dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????
You found the password:
dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????
Of course, changing the 12/21 to any other pair of swapped characters works just fine, and there are many more changes we could make within the range of these constraints. They are flexible enough to allow a whole range of valid inputs:
$ ./cm001
Input:
dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://
You found the password:
dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://
Closing thoughts ¶
I really enjoyed reversing this binary! It had a good combination of challenges, and the signal handler trick with SIGFPE was something I had never seen in a crackme. This is the first post where I attached handwritten notes; I hope these are as useful to readers as they are to me.
June 6, 2022 update: The crackme author
JCWasmx86 got in touch, and provided a link to a recently published
GitHub repo containing the source code for it. Note that the binary needs to be built with -O3 for it to behave as expected.
While it’s great to have the source code, I would recommend only looking at it once you understand the disassembled binary.
June 23, 2022 update: While discussing this crackme with the author, I realized that the “valid” solutions I had found were not accepted when I ran the program in a container. This led me to find the mistake I had made about div rbx not actually dividing by rax, and to discover that whether a solution is accepted is partially dependent on assumptions from the author regarding the value of the rdx register when the div instruction is executed. Interestingly, the author stated that in their experience the -O3 build flag had an impact on the value taken by the rdx register before the div instruction.
For the sake of transparency, a past copy of this article with the error is still readable thanks to the Internet Archive.