Crackme 21: JCWasmx86’s cm001
After a break of a few months where I pursued other projects, I was missing a good RE challenge so looked for new submissions on crackmes.one. I found JCWasmx86’s cm001 crackme which had good ratings, so decided to give it a try. As usual, the binary is mirrored here.
The description says it’s written in C and assembly, and that it runs on Linux. The goal is to “find one of the valid passwords. (There are many possible ones)”.
Let’s start by running the binary:
$ ./cm001
Input:
hello
Failed to validate password
$ echo $?
1
strings
reveals a promising message:
$ strings ./cm001 | grep password
You found the password:
Failed to validate password
Reading the input in main
¶
Jumping into disassembly, we can easily find main
since this binary seems to have most if not all of its symbols intact. In main
, a 65-byte buffer is filled with null bytes and used to receive user input using fgets
:
004010b1 pxor xmm0, xmm0 ; erases the 128-bit xmm0 register
004010b5 lea rdi, [rel data_40202c] ; loads a string
004010bc sub rsp, 0x50 ; reserves some stack space
004010c0 movaps xmmword [rsp {var_58}], xmm0 ; moves these zero'd 128 bits to var_58
004010c4 mov rbp, rsp {var_58}
004010c7 movaps xmmword [rsp+0x10 {var_48}], xmm0 ; erases the next 16 bytes
004010cc movaps xmmword [rsp+0x20 {var_38}], xmm0 ; and 16 more
004010d1 movaps xmmword [rsp+0x30 {var_28}], xmm0 ; and 16 more
004010d6 mov byte [rsp+0x40 {var_18}], 0x0 ; as well as a terminating null byte
004010db call puts ; prints "Input:"
004010e0 mov rdx, qword [rel stdin] ; arg to fgets: read from stdin
004010e7 mov esi, 0x41 ; fgets will read up to 65-1 bytes
004010ec mov rdi, rbp {var_58} ; into var_58
004010ef call fgets
004010f4 mov rdi, rbp {var_58} ; then use var_58
004010f7 call strlen ; as an argument to strlen
004010fc mov rdi, rbp {var_58}
004010ff mov esi, eax ; eax contains the result from strlen
00401101 call sub_40152b ; jump somewhere else
Note the use of the SSE register xmm0
, which is 16 bytes or 128 bits wide.
The sequence above corresponds approximately to:
char var_58[65] = {0};
puts("Input:");
fgets(var_58, sizeof(var_58), stdin);
size_t eax = strlen(var_58);
sub_40152b(var_58, eax);
A first input constraint on xmm0
¶
Let’s continue in sub_40152b
. I assumed above that it took 2 parameters, since the user input is placed in rdi
and the length in esi
. These are the first 2 registers used to pass parameters in the System V AMD64 ABI calling convention.
This function starts with an immediate comparison of the input length and 0x40
, jumping over a large block of code if the two don’t match. We can probably infer that our password needs to be 64 characters long:
0040152b cmp esi, 0x40 ; compare length and 0x40
0040152e mov qword [data_4040a0], rdi ; also store a pointer to our input at this address
00401536 jne sub_40168a ; no match: jump way down
Right away, we start unpacking the input into SSE registers, from xmm0
to xmm3
. That’s 4 registers of 16 bytes each, covering the full 64 bytes of expected input:
0040153c movupd xmm0, xmmword [rdi] ; first 16 bytes
00401540 movupd xmm1, xmmword [rdi+0x10] ; next 16…
00401545 movupd xmm2, xmmword [rdi+0x20] ; etc.
0040154a movupd xmm3, xmmword [rdi+0x30]
A long sequence of XOR operations follows, one over 16 bytes and the rest over single bytes. Its starts with xmm0
being XOR’d with the 128-bit xmmword at data_404080
, which points to the string "dG46rskj8-457(~:"
. After this initial wide XOR, the program extracts single bytes from the xmm0
register with pextrb
(extracts a single byte at the given offset), and using eax
as an XOR “accumulator” combines the extracted bytes together:
0040154f pxor xmm0, xmmword [data_404080] ; XORs this "secret" string with our first 16 bytes
00401558 xor eax, eax ; resets eax
0040155a pextrb r8d, xmm0, 0x0 ; r8d <- input[0] ^ secret[0]
00401561 xor eax, r8d ; eax ^= r8d
00401564 pextrb r8d, xmm0, 0x1 ; r8d <- input[1] ^ secret[1]
0040156b xor eax, r8d ; eax ^= r8d
0040156e pextrb r8d, xmm0, 0x2 ; ... the same
00401575 xor eax, r8d
00401578 pextrb r8d, xmm0, 0x3
0040157f xor eax, r8d
00401582 pextrb r8d, xmm0, 0x4
00401589 xor eax, r8d
0040158c pextrb r8d, xmm0, 0x5
00401593 xor eax, r8d
00401596 pextrb r8d, xmm0, 0x6
0040159d xor eax, r8d
004015a0 pextrb r8d, xmm0, 0x7
004015a7 xor eax, r8d
004015aa pextrb r8d, xmm0, 0x8
004015b1 xor eax, r8d
004015b4 pextrb r8d, xmm0, 0x9
004015bb xor eax, r8d
004015be pextrb r8d, xmm0, 0xa
004015c5 xor eax, r8d
004015c8 pextrb r8d, xmm0, 0xb
004015cf xor eax, r8d
004015d2 pextrb r8d, xmm0, 0xc
004015d9 xor eax, r8d
004015dc pextrb r8d, xmm0, 0xd
004015e3 xor eax, r8d
004015e6 pextrb r8d, xmm0, 0xe
004015ed xor eax, r8d
004015f0 pextrb r8d, xmm0, 0xf ; all the way to the 16th byte
004015f7 xor eax, r8d ; all XOR'd together
004015fa cmp eax, 0x0 ; and if all of that is not zero
004015fd jne sub_40168a ; then jump somewhere.
To think about these bytes, we have to consider the endianness of the plaform: with input[0:16]
XOR dG46rs…
stored in xmm0
, using pextrb
to extract a byte at a specific offset will select that byte starting from the least significant one. Here, the instruction at 0x0040155a
that pulls byte zero with pextrb r8d, xmm0, 0x0
is setting r8d
to input[0]
XOR secret[0]
.
A quick look at sub_40168a
tells us what we suspected: it prints an error message and exits:
0040168a mov rax, 0x1 ; system call 1: sys_write
00401691 mov rdi, 0x1 ; fd=1, that's stdout
00401698 mov rsi, data_404060 {"Failed to validate password\n"}
0040169f mov rdx, 0x1c ; 0x1c is the message length, 28 bytes
004016a6 syscall ; prints it
004016a8 mov rax, 0x3c ; syscall 0x3c is sys_exit
004016af mov rdi, 0x1 ; with status code 1.
004016b6 syscall ; bye
From this point, I’ve renamed sub_40168a
to print_failed_to_validate_and_exit
to make it more visible.
Now that we have our first constraint to apply to the first 16 bytes of input, let’s continue back after the jne
that started our detour.
A second constraint, now on xmm1
¶
00401603 mov rax, 0x11 ; rax is set to 0x11
0040160a add rax, rax ; doubled
0040160d xor r8, r8 ; r8 is reset to zero
00401610 pextrb r8d, xmm0, 0xb ; r8d <- input[11] ^ secret[11]
00401617 imul rax, r8 ; rax <- rax * r8
0040161b add rax, rax ; rax <- rax * 2
0040161e add rax, rax ; rax <- rax * 2
00401621 add rax, 0x13a ; rax <- rax + 0x13a
00401627 movupd xmmword [data_4040b0], xmm1 ; copies xmm1 into this static variable
00401630 xor rcx, rcx ; resets rcx (used below as a counter)
00401633 mov r8, 0x4040b0 ; r8 <- 0x4040b0 (the address of the variable we just wrote xmm1 to)
0040163a xor rbx, rbx ; resets rbx
0040163d mov bl, byte [r8+rcx] ; takes one byte from xmm1[counter] <――╮
00401641 sub rax, rbx ; removes it from rax │
00401644 inc rcx ; increments the counter │
00401647 cmp ecx, 0x10 ; if not equal to 16 │
0040164a jne 0x40163d ; loop back up ―――――――――――――――――――――――╯
0040164c cmp rax, 0x0
00401650 jne print_failed_to_validate_and_exit
Side-note: r8d
and r8b
are to r8
what eax
and al
are to rax
.
To recap in pseudo-code:
// we start by computing a value based on input[11] ^ secret[11]
rax = 0x11 * 2;
r8d = input[11] ^ secret[11];
rax *= r8; // multiplied
rax += rax; // doubled
rax += rax; // doubled again
rax += 0x13a;
// equivalent to: rax = (0x88 * (input[11] ^ secret[11]) + 0x13a)
ptr = xmm1 (== &input[16]);
counter = 0;
do { // for each of the next 16 bytes (xmm1 goes from 16 to 31)
r8 = ptr;
rbx = 0;
bl = ptr[counter]; // we take byte[i]
rax -= bl; // and remove it from the computed value rax
counter++;
} while (counter != 16);
if (rax != 0) { // they must match
print_failed_to_validate_and_exit();
}
This gives us another constraint on the input.
Escaping the error message ¶
Looking at the instructions below this point, we start to notice an impasse: the jumps to print_failed_to_validate_and_exit
were just jumps, not function calls. And we’re fast approaching this location, in fact there are no conditional jumps between the last jne print_failed_to_validate_and_exit
we saw at 0x00401650
and the error message print itself. It almost feels like we went down the wrong path, since we clearly have no way out.
Here’s where we are, in context:
With seemingly no way to avoid the error message, let’s take a closer look at the few instructions ahead of us. Starting at 0x00401652
:
00401652 movq rax, xmm3 ; movq takes the lower 64 bits from xmm3
00401657 pextrq rbx, xmm2, 0x1 ; extracts 64 bits from byte 1 of xmm2, into rbx
0040165e xor rax, rbx ; xors the two together, stores the result in rax
00401661 mov rbx, 0xffff231203 ; puts this long constant into rbx (no clear reason for this)
0040166b pextrq r8, xmm3, 0x1 ; extracts 64 bits again from offset 1 in xmm3 into r8
00401672 movq r9, xmm2 ; and the lower 64 bits of xmm2 into r9
00401677 mov qword [data_4040b0], r9 ; saves r9 to a static variable
0040167f mov qword [data_4040a8], r8 ; and r8 to another
00401687 div rbx ; *divides* (rdx:rax) by rbx(!)
; falls through into print_failed_to_validate_and_exit
There’s clearly no jump here, we just continue on to the error message. The only way I can think of that would cause us to bail from these few instructions would be for one of them to trigger a fault of some kind. One approach would be to access memory that’s unreachable, but this code doesn’t compute any pointers to write to or read from.
Our only remaining option is with the last instruction div rbx
, which divides rdx:rax
by rbx
; we could try to trigger a SIGFPE
(arithmetic exception) by manipulating rdx
and/or rax
so that it ends up overflowing. Despite its name meaning “signal floating-point error”, SIGFPE
can be thrown here during this integer division, in one of two cases (Intel reference manual, Vol. 2A 3-317 “Unsigned Divide”):
- If the denominator is zero, which we know is impossible here since we have a
mov rbx, 0xffff231203
just above. - If the result of the division doesn’t fit in 64 bits. Keep in mind that it’s not just
rax
that’s being divided byrbx
, butrdx:rax
. This syntax describes the concatenation over 128 bits of registersrdx
andrax
to create the numerator.
For example, if we haverdx = 0x1234
,rax = 0x5678
, andrbx = 0xabcdef
, what we’re computing is((rdx << 64) | rax) / rbx
, or0x12340000000000005678 / 0xabcdef
which produces the quotient0x1b1fb592b526e1
inrax
and the remainder0x73dd69
inrdx
. There’s no error since the quotient fits in 64 bits.
But if we have a larger value inrdx:rax
and the quotient is too large to write torax
, for example if we kept the same values except forrdx = 0xffffffff
, the quotient would be0x17d74fd494920f78247
which clearly doesn’t fit in 64 bits. This would trigger aSIGFPE
signal.
Let’s see what this means about our input. Even if we somehow managed to make rax
as large as possible (all bits set), the division by the constant in rbx
would still succeed if rdx
was left to zero. We can try to find some instructions that manipulate bits of rdx
based on our input, and set enough bits that rdx:rax
becomes large enough before the division. But going back to the caller, there really aren’t that many places that touch rdx
; the only one I could find was at 0x04010e0
with a mov rdx, qword [stdin]
but this is just setting up a parameter for fgets
.
A quick side note about pextrq
: here the offset is a qword offset and not a byte offset as in the sequence of pextrb
instructions above. The documentation for PEXTRQ r/m64, xmm2, imm8
says: “Extract a qword integer value from xmm2
at the source qword offset specified by imm8
into r64
”. With a source qword offset of 1, we copy the high (most significant) 64 bits of xmm2
; if it was zero it would be the low (least significant) 64 bits.
A register left to chance ¶
It doesn’t seem to me that there is any way to control the value of rdx
from the input. While investigating this issue I noticed a difference between the behavior of the program when I ran it within a container vs directly in a virtual machine. Both systems were x86-64 running Debian 11 “Bullseye”, with the same password input. I set a breakpoint on the div
instruction at 0x401687
with b *0x401687
in GDB, and looked up the registry values at that time. In the container I had a small value for rdx
:
$rax : 0x0
$rbx : 0xffff231203
$rcx : 0x10
$rdx : 0x30
while in the VM it was a much larger number:
$rax : 0x0
$rbx : 0xffff231203
$rcx : 0x10
$rdx : 0x7fffffffdb70
With such a large value in rdx
, the result of the division by rbx
does not fit in 64 bits:
>>> rdx = 0x7fffffffdb70
>>> rax = 0
>>> rbx = 0xffff231203
>>> hex(((rdx << 64) | rax) // rbx)
'0x80006e77394526c345'
From what I can tell, it looks like whether the input is accepted as valid depends on an assumption regarding the value of rdx
once we get to the div rbx
instruction. Without a way to influence rdx
from our inputs, this means that the validity of the password is likely platform-dependent.
In my case, I had started looking at this div
with assumptions based on a misunderstanding: I mistakenly thought that the div rbx
instruction would divide by rax
instead of dividing (rdx:rax)
. This led me to try to set rax
to zero, in order to divide by zero and trigger the signal. This is actually unnecessary, but for the record this is how I went about it:
<unnecessary-detour>
Here’s what a 128-bit SSE register would contain for the given input "hello, RE world!"
, as well as which parts are read by pextrb
and pextrq
commands:
In pseudo-code, this is what we did:
uint8_t rax[8];
uint8_t rbx[8];
memcpy(rax, xmm3, sizeof(uint64_t));
memcpy(rbx, xmm2 + 1 * sizeof(uint64_t), sizeof(uint64_t));
for (int i = 0; i < 8; i++) {
rax[i] ^= rbx[i];
}
Recall that xmm2
was set with movupd xmm2, xmmword [rdi+0x20]
and xmm3
with movupd xmm3, xmmword [rdi+0x30]
. In xmm2
we have input[32:47]
(16 bytes), and in xmm3
we have input[48:63]
.
rax
is therefore the XOR’d result of input[48:55]
with input[40:47]
. This is our new input constraint: having both of these sequences contain the same bytes would be enough to set rax
to zero and trigger a (edit: this is not what SIGFPE
in div rbx
div rbx
does).
</unnecessary-detour>
So all we need for this section is to run the binary on a machine where rdx
will be set to a large value somewhere during the execution of the program. Whether this will happen likely depends on a number of factors. My attempts at running this in Docker from a variety of base images have all failed, but I was able to get a large value in rdx
consistently by running the binary in a Google Cloud Shell.
From this point going forward I will assume that we’ve triggered the SIGFPE
, meaning that we’re running the binary on a machine where rdx
ends up having a large value when the div rbx
instruction is reached.
Discovering the signal handler ¶
While this error prevents us from continuing through to the error message, we still have to figure out where we end up. How was the signal even caught, did we miss something in main
? I had first opened this executable with Binary Ninja, which identifies the code at 0x004010b0
as being the main
function. But if we’d just had a quick glance a few lines above main
, we would have seen a signal handler being installed for signal 8 (SIGFPE
):
00401090 lea rsi, [rel sub_4013e0] ; presumably a signal handler
00401097 mov edi, 0x8 ; signal 8, SIGFPE
0040109c jmp signal ; note the `jmp`, not `call`
It is interesting that these instructions are not reachable from main
, but instead seem to be configured as a prelude to main
itself. This sort of trickery can let developers run code before main
without making it too obvious, and I believe this is the first time I’ve analyzed a crackme using this technique. With GCC, doing so can be achieved with a function attribute like constructor
which causes the function to be called automatically before execution enters main()
.
The code jumps to a call
instruction for the signal
libc function (see man page here). We see once again edi
and rsi
used as first and second parameters; this matches the expected order in which arguments are passed in to sig_t signal(int sig, sig_t func)
.
If we’re hoping to trigger a SIGFPE
, the next logical step is to look at the handler that was registered and see where we land.
At first glance, sub_4013e0
already looks promising. We can spot references to data_4040b0
and data_4040a8
(the two variables we had written to before the division), and there are also a bunch of transformations on them using xor
and sub
. We have a straight jmp
to print_failed_to_validate_and_exit
, and a few conditional jumps including one to 0x40143d
where some instructions refer to the string "You found the password:\n\t"
at 0x00402010
. We might be close!
Let’s take a step back to review what we had written to data_4040b0
and data_4040a8
:
00401661 mov rbx, 0xffff231203 ; puts this long constant into rbx
0040166b pextrq r8, xmm3, 0x1 ; extracts 64 bits again from qword offset 1 in xmm3 into r8
00401672 movq r9, xmm2 ; and the lower 64 bits of xmm2 into r9
00401677 mov qword [data_4040b0], r9 ; saves r9 to a static variable
0040167f mov qword [data_4040a8], r8 ; and r8 to another
This is approximately equivalent to:
uint64_t rbx, r8, r9;
uint8_t xmm2[16], xmm3[16]; // 128-bit registers
rbx = 0xffff231203; // doesn't really matter since it's unused
memcpy(&r8, xmm3 + sizeof(uint64_t), sizeof(uint64_t)); // quadword copy at offset 1 qword (so 8 bytes)
memcpy(&r9, xmm2, sizeof(uint64_t)); // quadword copy from the lowest 64 bits
*((uint64_t*)0x004040b0) = r9;
*((uint64_t*)0x004040a8) = r8;
Now that we’re entering sub_4013e0
, let’s see what’s between us and the success message. There are 6 conditional jumps, and if we managed to take the right ones and skip the others, it would lead us straight to the successful end of this program:
004013e0 mov rdi, qword [rel data_4040b0]
004013e7 mov r8, qword [rel data_4040a8]
004013ee push rbp ; saves rbp
004013ef cmp rdi, r8 ; if the two variables match
004013f2 je 0x4014c0 ; jumps to error message
004013f8 cmp rdi, 0xf ; if rdi > 15
004013fc ja 0x401490 ; calls `sub_401200`, changes eax, and continues at 00401413 (label A)
00401402 sub rdi, 0x1 ; rdi--
00401406 xor r9d, r9d ; resets r9d
00401409 cmp rdi, 0xe ; if rdi <= 14
0040140d jbe 0x4014b0 ; sets r9d to a value from an array, then jumps to label A
; (label A is here)
00401413 cmp r8, 0xf ; if r8 > 15
00401417 ja 0x401430 ; jumps to label D
00401419 sub r8, 0x1 ; r8--
0040141d xor eax, eax ; resets eax
0040141f cmp r8, 0xe ; if r8 <= 14
00401423 jbe 0x4014a0 ; sets eax to a value from an array, then jumps to label B
; (label B is here)
00401425 cmp eax, r9d
00401428 je 0x40143d ; jump to label E: password found!
; (label C is here)
0040142a pop rbp ; restores rbp
0040142b jmp print_failed_to_validate_and_exit
; (does not return)
; (label D is here)
00401430 mov rdi, r8 ; copies value
00401433 call sub_401200 ; calls this mystery function again
00401438 cmp eax, r9d ; if eax != r9d
0040143b jne 0x40142a ; then jump to label C
; (label E is here)
0040143d mov edx, 0x19 ; third argument: length=25
00401442 lea qword rsi, [0x402010] ; second argument: "You found the password:\n\t"
00401449 mov edi, 0x1 ; first argument: stdout
0040144e call write ; calls write(stdout, "You found…", 25)
00401453 mov rbp, qword [0x4040a0] ; points to our input
0040145a mov rdi, rbp ; first (and only) argument of strlen
0040145d call strlen ; calls strlen on our input, result goes into eax
00401462 mov rsi, rbp ; second argument: our input
00401465 mov edi, 0x1 ; first argument: stdout
0040146a mov rdx, rax ; third argument: the result of strlen
0040146d call write ; calls write(stdout, input, strlen(input))
00401472 mov edi, 0x1 ; first argument: stdout
00401477 mov edx, 0x1 ; third argument: length=1
0040147c lea rsi, qword [0x40202a] ; points to a single '\n' character
00401483 call write ; calls write(stdout, "\n", 1)
00401488 xor edi, edi ; resets edi
0040148a call exit ; calls exit(0)
Recall that the data at data_4040b0
was set to input[32:40]
(via r9
) and the data in data_4040a8
was set to input[56:63]
(via r8
).
The two short jumps at 0x004013fc
and 0x0040140d
refer to 0x00402040
, and both are for conditions where a register is less than 15. The data at that address seems to contain integers: just by looking at a few bytes there, we can recognize it as an array of int
. We can tell that it contains 15 values since exactly 60 bytes after it starts we have a header for an exception handler frame.
This is what’s at 0x00402040
:
00402040 01 00 00 00 01 00 00 00 02 00 00 00 01 00 00 00 02 00 00 00
00402054 02 00 00 00 03 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00
00402068 03 00 00 00 02 00 00 00 03 00 00 00 03 00 00 00 04 00 00 00
Which decodes to:
int32_t data_402040[] = { 1, 1, 2, 1, 2,
2, 3, 1, 2, 2,
3, 2, 3, 3, 4 };
There are multiple calls to sub_401200
, but let’s put off exploring this long function before we have a better understanding of where we are now.
Using Ghidra, we can decompile the current block at 0x004013e0
into a rough estimate of what the code looks like, and refine it into something more readable. After renaming a few variables and referencing the integer array, I ended up with this listing:
void sub_004013e0() {
int validation_1;
int validation_2;
size_t len;
uint32_t var_r8;
uint32_t var_rdi;
char *ptr;
// reading back the two values we saved earlier
var_r8 = *(0x004040a8);
var_rdi = *(0x004040b0);
// the two can't be the same
if (var_rdi == var_r8) {
print_failed_to_validate_and_exit();
}
// first validation, with `rdi`
if (var_rdi < 16) { // unlikely with character input
validation_1 = 0;
if (var_rdi - 1 < 15) {
validation_1 = data_402040[var_rdi - 1];
}
} else { // so we'll probably call `sub_401200`
validation_1 = sub_401200(var_rdi);
}
// second validation, with `r8` this time but pretty much the same
if (var_r8 < 16) { // also difficult with ascii input
validation_2 = 0;
if (var_r8 - 1 < 15) {
validation_2 = data_402040[var_r8 - 1];
}
if (validation_2 == validation_1) { // success
success:
write(1,"You found the password:\n\t",0x19);
ptr = input_backup;
len = strlen(input_backup);
write(1,ptr,len);
write(1,&new_line,1);
exit(0);
}
} else { // we're likely to end up here instead
validation_2 = sub_401200(var_r8); // calling this function again, with r8 this time
if (validation_2 == validation_1) goto success; // and hopefully jump into the success block
}
print_failed_to_validate_and_exit();
}
With this listing, it seems at first like there might be a direct path to the success message, depending on what ends up in validation_1
and validation_2
using var_rdi
and var_8
as offsets into the integer array data_402040
.
As the very first step, we make sure that the values of rdi
and r8
are different; if they are both strictly less than 16 and both point to the same integer in the array, then we’d have a match. This doesn’t really seem possible: we pulled these two values as “quad-words” (64-bit) from xmm2
and xmm3
, which contain some of our input bytes. We’d need them to be mostly null, with maybe a single remaining byte having a value under 16.
Therefore, our only alternative seems to be to match the return values of the two calls to sub_401200
, saved in validation_1
and validation_2
. If this was a hash function, then using the same input would get us to the same output, but here we have this additional constraint that rdi
and r8
cannot be equal.
Since it looks like we don’t have a choice, let’s explore sub_410200
and figure out how we can make it return the same value twice with different inputs.
A “hash” function ¶
sub_410200
starts by taking its parameter rdi
and copying it to rcx
and rdx
:
00401200 mov rcx, rdi
00401203 mov rdx, rdi
After these two instructions, 16 blocks follow with a very similar structure to each other. They all differ by a small amount given by a constant, so this looks very much like an unrolled loop. The constant is unused in the first block, then has value 4 in the second, 8 in the third… up to 0x3c
or 60 in the last. We’re counting from 0 to 64 in steps of 4. Let’s look at the third block (i=2
) to understand the overall structure:
0040123d mov rcx, rdx ; restores our input (originally rdi) into rcx
00401240 shr rcx, 0x8 ; shift right by 8 (which is 4 times 2 if 2 is a block counter from 0 to 16)
00401244 and ecx, 0xf ; in the bottom 32 bits, keep only the lowest 4 (the high-32 are unchanged)
00401247 sub rcx, 0x1 ; remove 1
0040124b cmp rcx, 0xe ; compare to 14
0040124f ja 0x40125b ; if strictly greater, skip over the next two instructions
00401251 lea rsi, [rel data_402040] ; if it was lower or equal, then use the `data_402040` array
00401258 add eax, dword [rsi+rcx*4] ; and add the value at (integer) offset rcx to eax.
Converted back to a loop, we have something like:
uint64_t sub_401200(uint64_t rdi) {
uint64_t rdx = rdi; // back up the input
uint64_t eax = 0; // reset the accumulator
for (int i = 0; i < 16; i++) {
uint64_t rcx = rdx;
rcx >>= (i * 4); // shift by 4 more bits in each loop
rcx &= 0xf; // mask: keep only the 4 least significant bits
rcx--; // decrement
// use rcx as an index into the array
if (rcx <= 14) {
eax += data_402400[rcx]; // add this number to eax
}
}
return eax;
}
We can think of this loop as doing 16 passes over the 16 “hex digits” of rdi
, where for each one it’ll decrement the “digit” by one and use this value to read the number at that offset in the array, summing them all.
If our input was 0x1122334455667788
, then we’d compute data_402400[0]
+ data_402400[0]
+ data_402400[1]
+ … + data_402400[7]
.
This is our last condition: running this algorithm on var_r8
and var_rdi
must produce the same value, which seems feasible given that the array data_402400
has many duplicate values. In fact we don’t even have to compute them by hand: if their values are an anagram of each other in hex, then we’ll compute the same offsets into the array and add those values to the running total in eax
. Even just swapping two characters would be enough.
Putting it all together ¶
Let’s review the constraints we found to see if we can come up with an input string that satisfies them all.
Given secret
= "dG46rskj8-457(~:"
,
- The first 16 characters first XOR’d with
secret
and then together must be zero - The sum of the next 16 bytes must equal
(0x88 × (input[11] ^ secret[11]) + 0x13a)
The sequence from 48 to 55 must match 40 to 47 for us to trigger the SIGFPE
(as we saw this isn’t actually required)sub_401200(input[32:47])
must be equal tosub_401200(input[48:63])
, butinput[32:47]
must not be equal toinput[48:63]
An easy way to satisfy the first constraint is to use secret
as the first 16 bytes, but this would give us input[11] ^ secret[11]
= 0
, so the sum of the next 16 bytes would have to equal 0x13a
, which is only 314 – an average of 19 per byte and far from printable characters.
Let’s first compute all the possible values for this sum:
values = set()
secret = "dG46rskj8-457(~:"
for a in map(ord, secret):
for b in map(ord, secret):
total = 0x88 * (a ^ b) + 0x13a
values.add((chr(a), chr(b), total))
Then see if one of the totals is easily reachable using 16 characters with 2 different values:
>>> list(filter(lambda x: x[1] == '5' and chr(x[2]//16).isascii(), values))
[('7', '5', 586), ('6', '5', 722), ('4', '5', 450), ('5', '5', 314)]
722 is 16 × 45 + 2, with 45 being the ascii code for '-'
and 46 the one for '.'
; 16 dashes would sum up to 720, but replacing two of them with dots brings us to 722:
>>> sum(map(ord, ('-' * 14 + '.' * 2)))
722
>>> sum(map(ord, ('-' * 15 + '/'))) # there are
722
>>> sum(map(ord, (',' * 14 + '55'))) # countless
722
>>> sum(map(ord, ('+' * 12 + '1346'))) # other options…
722
Our target is 722 if input[11]
and secret[11]
are '6'
and '5'
, respectively.
We could try having input
being almost the same as secret
, except with characters at offsets 3
and 11
swapped:
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | d | G | 4 | 5 | r | s | k | j | 8 | - | 4 | 6 | 7 | ( | ~ | : |
secret | d | G | 4 | 6 | r | s | k | j | 8 | - | 4 | 5 | 7 | ( | ~ | : |
The first constraint is satisfied:
>>> input0_16 = "dG45rskj8-467(~:"
>>> secret = "dG46rskj8-457(~:"
>>> from functools import reduce
>>> reduce(lambda a,b: a^b, [ord(secret[i]) ^ ord(input0_16[i]) for i in range(16)])
0
and so is the second:
>>> input16_32 = '-' * 14 + '.' * 2
>>> sum(map(ord, input16_32))
722
>>> sum(map(ord, input16_32)) == 0x88 * (ord(input0_16[11]) ^ ord(secret[11])) + 0x13a
True
Here’s what it looks like for the remaining constraints:
For the third one, we can pick anything we want as long as the right ranges match. In xmm2
:
index | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | ? | ? | ? | ? | ? | ? | ? | ? | A | B | C | D | E | F | G | H |
and xmm3
:
index | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | A | B | C | D | E | F | G | H | ? | ? | ? | ? | ? | ? | ? | ? |
(the question marks ?
are placeholders and could be any character, for now).
To this we’ll need to add the last constraint, which is to match the “hash” produced by sub_401200
. Since this hash is computed by using each “hex digit” as an index into an array, we can have two different letters swapped and the sum will end up being the same.
If we use the characters 1
and 2
(ASCII 0x31
& 0x32
) with 12
in the first string and 21
in the other, the “hex letters” will be [3,1,3,2]
for 12
and [3,2,3,1]
for 21
. After decrementing at 0x00401247
, the hash will be computed from the offsets [2,0,0,1]
in one case and [0,1,2,0]
in the other. The sum will be the same.
We can keep our question marks but change [32:33]
to 12
and [56:57]
to 21
so that input[32:39]
is not equal to input[56:63]
:
index | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | 1 | 2 | ? | ? | ? | ? | ? | ? | A | B | C | D | E | F | G | H |
and
index | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | A | B | C | D | E | F | G | H | 2 | 1 | ? | ? | ? | ? | ? | ? |
All together:
$ ./cm001
Input:
dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????
You found the password:
dG45rskj8-467(~:--------------..12??????ABCDEFGHABCDEFGH21??????
Of course, changing the 12
/21
to any other pair of swapped characters works just fine, and there are many more changes we could make within the range of these constraints. They are flexible enough to allow a whole range of valid inputs:
$ ./cm001
Input:
dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://
You found the password:
dG45rskj8-476(~:$$$$$$%%%%%%%%%%https://re.kv.iore.kv.iothtps://
Closing thoughts ¶
I really enjoyed reversing this binary! It had a good combination of challenges, and the signal handler trick with SIGFPE
was something I had never seen in a crackme. This is the first post where I attached handwritten notes; I hope these are as useful to readers as they are to me.
June 6, 2022 update: The crackme author
JCWasmx86 got in touch, and provided a link to a recently published
GitHub repo containing the source code for it. Note that the binary needs to be built with -O3
for it to behave as expected.
While it’s great to have the source code, I would recommend only looking at it once you understand the disassembled binary.
June 23, 2022 update: While discussing this crackme with the author, I realized that the “valid” solutions I had found were not accepted when I ran the program in a container. This led me to find the mistake I had made about div rbx
not actually dividing by rax
, and to discover that whether a solution is accepted is partially dependent on assumptions from the author regarding the value of the rdx
register when the div
instruction is executed. Interestingly, the author stated that in their experience the -O3
build flag had an impact on the value taken by the rdx
register before the div
instruction.
For the sake of transparency, a past copy of this article with the error is still readable thanks to the Internet Archive.