Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode: fix reading buffer overflow in valid_utf8() #5531

Open
wants to merge 5 commits into
base: bleeding-jumbo
Choose a base branch
from

Conversation

AlekseyCherepanov
Copy link
Member

valid_utf8() reads up to 2 bytes far behind the end of string because it checks bytes right-to-left. I switched order to check left-to-right. It means that inner switch should be split. Also I moved around a few ifs: < 0xC2 is checked before main switch, so the switch is for almost valid starting bytes, > 0xF4 was moved into case 4 now because other cases did not need it (in both versions).

diff looks terrible. I guess review of full code would be easier.

To validate my changes, I wrote a script in python to call the function. To call the function, I compiled unicode.c alone as a dynamic library and used python's ctypes. fake.c is needed to resolve unused symbols belonging to other files in john.

The script:

# code to structure allowed utf-8 chars, test john

from ctypes import *

'''
int options;
int real_error;
int john_MD4_Final;
int john_MD4_Update;
int john_MD4_Init;
''' # ^ contents of fake.c

# gcc -fPIC -shared -Wl,-no-undefined john/src/unicode.c fake.c -o u.so
john_valid_utf8 = CDLL("./u.so").valid_utf8
john_valid_utf8.restype = c_int
john_valid_utf8.argtypes = [ c_char_p ]

def check_john(r, b):
    if r == False:
        expected = 0
    elif len(b) == 1:
        expected = 1 # pure ascii
    else:
        expected = 2 # some non-ascii, we have only 1
    j = john_valid_utf8(b)
    assert j == expected, (j, expected, r, b)

# def check_john(r, b):
#     pass

def R(a, b):
    return list(range(a, b))

# Rough ranges
ascii    = R(0x0, 0x80)
trailing = R(0x80, 0xc0)
start2   = R(0xc0, 0xe0)
start3   = R(0xe0, 0xf0)
start4   = R(0xf0, 0x100)

not_trailing = ascii + start2 + start3 + start4
all_bytes = R(0x0, 0x100)

def is_valid(*args):
    b = bytes(args)
    r = None
    try:
        b.decode('utf-8')
        r = True
    except UnicodeDecodeError:
        r = False
    check_john(r, b)
    return r

def combine(args):
    r = [ 0, 0 ] # False and True are 0 and 1
    if len(args) == 1:
        for a in args[0]:
            r[is_valid(a)] += 1

    elif len(args) == 2:
        for a in args[0]:
            for b in args[1]:
                r[is_valid(a, b)] += 1

    elif len(args) == 3:
        for a in args[0]:
            for b in args[1]:
                for c in args[2]:
                    r[is_valid(a, b, c)] += 1

    elif len(args) == 4:
        for a in args[0]:
            for b in args[1]:
                for c in args[2]:
                    for d in args[3]:
                        r[is_valid(a, b, c, d)] += 1

    else:
        assert 0, 'wrong number of args'
    return r

def expect(count_false_and_true, *args):
    assert not (count_false_and_true[0] == -1 and count_false_and_true[1] == -1)
    for i in 0, 1:
        if count_false_and_true[i] == all:
            t = 1
            for a in args:
                t *= len(a)
            count_false_and_true[i] = t
    r = combine(args)
    assert r == count_false_and_true, (r, count_false_and_true)
    return r

# 1 byte
expect([ 0, len(ascii) ], ascii)
expect([ len(trailing), 0 ], trailing)
expect([ len(start2), 0 ], start2)
expect([ len(start3), 0 ], start3)
expect([ len(start4), 0 ], start4)

# 2 bytes starting with trailing
expect([ len(trailing) * len(all_bytes), 0 ], trailing, all_bytes)

# start2
# c0 and c1 are not in the range
expect([ 2 * len(trailing), 0 ], [ 0xc0, 0xc1 ], trailing)
expect([ 2 * len(trailing), (len(start2) - 2) * len(trailing) ], start2, trailing)

expect([ len(start2) * len(not_trailing), 0 ], start2, not_trailing)
expect([ len(start2) * len(trailing) ** 2, 0 ],
       start2, trailing, trailing)

# start3

# E0 80..9F X and ED A0..BF X are invalid
expect([ all, 0 ], [ 0xE0 ], R(0x80, 0xA0), all_bytes)
expect([ all, 0 ], [ 0xED ], R(0xA0, 0xC0), all_bytes)
r1 = expect([ all, 0 ], [ 0xE0 ], R(0x80, 0xA0), trailing)
r2 = expect([ all, 0 ], [ 0xED ], R(0xA0, 0xC0), trailing)
expect([ r1[0] + r2[0], len(start3) * len(trailing) ** 2 - r1[0] - r2[0] ],
       start3, trailing, trailing)

# start4

# F0 80..8F X Y and F4 90..BF X Y are invalid

expect([ 0, all ], [ 0xF0 ], R(0x90, 0xC0), trailing, trailing)
expect([ all, 0 ], [ 0xF0 ], R(0x80, 0x90), all_bytes, all_bytes)
r1 = expect([ all, 0 ], [ 0xF0 ], R(0x80, 0x90), trailing, trailing)

expect([ 0, all ], [ 0xF4 ], R(0x80, 0x90), trailing, trailing)
expect([ all, 0 ], [ 0xF4 ], R(0x90, 0xC0), all_bytes, all_bytes)
r2 = expect([ all, 0 ], [ 0xF4 ], R(0x90, 0xC0), trailing, trailing)

# F5..FF X Y Z are invalid
r3 = expect([ all, 0 ], R(0xF5, 0x100), trailing, trailing, trailing)

expect([ r1[0] + r2[0] + r3[0], len(start4) * len(trailing) ** 3 - r1[0] - r2[0] - r3[0] ],
       start4, trailing, trailing, trailing)

# expect([ len(start3) * len(all_bytes) * len(not_trailing), 0 ],
#        start3, all_bytes, not_trailing)
# expect([ len(start4) * len(all_bytes) ** 2 * len(not_trailing), 0 ],
#        start4, all_bytes, all_bytes, not_trailing)

@AlekseyCherepanov
Copy link
Member Author

Minor note: gcc 12 expanded switch for E0,ED,F0,F4 into series of checks, so splitting it into 2 switch statements does not increase size. Actually size in disassembly became smaller because case 1 is not reachable anymore (previously invalid trailing bytes at the start could land into this branch, but I moved their check out from the switch; I count size in disassembly this way: objdump -d unicode.o | sed -ne '/<valid_utf8>:/,/^$/ p' | wc -l).

valid_utf8() is used in multiple places including loader.c. Let's overread line_buf:

$ perl -C0 -e 'print ":12345" x 170 . "a\xf1"' > t.pw && ../run/john t.pw
=================================================================
==1583178==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc92e219f0 at pc 0x55c00d0dc209 bp 0x7ffc92e21480 sp 0x7ffc92e21478
READ of size 1 at 0x7ffc92e219f0 thread T0
    #0 0x55c00d0dc208 in valid_utf8 .../src/unicode.c:554
    #1 0x55c00d0647b2 in read_file .../src/loader.c:255
    #2 0x55c00d06c368 in ldr_load_pw_file .../src/loader.c:1189
    #3 0x55c00d05a667 in john_load .../src/john.c:1149
    #4 0x55c00d05a667 in john_init .../src/john.c:1598
    #5 0x55c00d05a667 in main .../src/john.c:2084
    #6 0x7fdf45646249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #7 0x7fdf45646304 in __libc_start_main_impl ../csu/libc-start.c:360
    #8 0x55c00cc0f1f0 in _start (.../run/john+0x1ff1f0)

Address 0x7ffc92e219f0 is located in stack of thread T0 at offset 1280 in frame
    #0 0x55c00d063f1b in read_file .../src/loader.c:209

  This frame has 2 object(s):
    [48, 192) 'file_stat' (line 210)
    [256, 1280) 'line_buf' (line 212) <== Memory access at offset 1280 overflows this variable
[...]

@solardiz
Copy link
Member

Wow, that's an extensive test script you wrote. Maybe it should be made part of our unit tests invoked on make check, but only when Python is available?

valid_utf8() is used in multiple places including loader.c. Let's overread line_buf:
$ perl -C0 -e 'print ":12345" x 170 . "a\xf1"' > t.pw && ../run/john t.pw

I guess you mean with code from prior to this PR's changes, to reproduce the original bug?

@AlekseyCherepanov
Copy link
Member Author

Yes, ASAN gets triggered prior to the patch. And it does not fire after the patch.

@AlekseyCherepanov
Copy link
Member Author

The script is slow. It takes 70 seconds using PyPy3 and 57 seconds after compilation with cython3 (without changes to the script: cython3 --embed -3 test_uni2.py && gcc -I/usr/include/python3.11/ test_uni2.c -lpython3.11). I rewrote it in C to make it faster and more structured. It takes 3.7 seconds (I use sparse checks for some ranges; 13 seconds without them, but I still check more than the script). I'll add a commit tomorrow.

@solardiz
Copy link
Member

It takes 3.7 seconds (I use sparse checks for some ranges; 13 seconds without them, but I still check more than the script). I'll add a commit tomorrow.

I think try even sparser checks for under 1 second. Thanks!

Copy link
Member

@magnumripper magnumripper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at this and the few things I was worried about (such as 0xF4) was explained in the PR description, so I'm happy with it. Especially with that thorough test script! Also I can't see any reason it would end up slower (perhaps faster).

@AlekseyCherepanov
Copy link
Member Author

Nice, CI found problems with symbols referenced in unicode.o I did not have locally. I can reproduce them: --without-openssl is required for ./configure for that. I'll fix it.

Meanwhile a few words about the commit:

I did not feel comfortable inserting a lot of code into unit-tests.c. But I needed functions to integrate with the report. So I included my file using #include. (Afterthought: _test_valid_utf8 could use a local counter for tests and return it, so it would be ok to compile it separately. OTOH I also use hex() from unit-tests.c.)

In regular make and Makefile.legacy, I compile with actual unicode.o instead of sources. Makefile.legacy required -lcrypto for MD4_Init and friends. I picked it from command in the regular build.

I think memory.c and common.c do not need separate compilation. _JOHN_MISC_NO_LOG is internal to misc.c. So there should not be actual deviations from regular compilation for these files and .o could be reused.

BTW I have idea that switch could be replaced with explicit checks with the same "fall-through" style. Then the code would not need the lookup having the same number of checks in assembly.

ASAN build:

  test 38 test_valid_utf8         -  Performed 49022799 tests 0.76 seconds used

In regular build, the test takes under 0.4 seconds.

So I have 2 more changes in mind:

  • reuse .o for memory.c and common.c
  • replace lookup+switch with ifs on values

@AlekseyCherepanov
Copy link
Member Author

In regular make and Makefile.legacy, I compile with actual unicode.o instead of sources. Makefile.legacy required -lcrypto for MD4_Init and friends. I picked it from command in the regular build.

This part was wrong. Now I compile tests/unicode.o separately defining UNICODE_NO_OPTIONS and NOT_JOHN, so it does not use options and MD4_* and I don't need -lcrypto and/or fake symbols.

Guarding with #ifndef UNICODE_NO_OPTIONS was slightly incomplete. So one commit is dedicated to the fix.

@AlekseyCherepanov
Copy link
Member Author

I use unicode.o as a dependency for tests/unicode.o as a shortcut to reuse all dependencies including unused options.h. I considered it to be a better solution than copy-pasting dependencies with or without modifications. unicode.o should be built anyway in usual places where unit-tests gets build.

Other way could be to add tests/unicode.o as target near unicode.o. Like this:

unicode.o tests/unicode.o: [...]

@AlekseyCherepanov
Copy link
Member Author

And the last commit is to remove lookup from valid_utf8() rewriting switch with explicit ifs. The code has similar structure with "fall-through" semantics. (And it looks strange if I try to forget that it is a switch by nature.) Now default case falls into 4+ bytes and then gets filtered out in > 0xF4 check.

I don't insist on this change.

But I have an idea for even more obscure modification: switch for E0/ED could be replaced with the following bit tricks.

if (*source == (0xE0 | (0xED & -(a > 0x9F)))) return 0;

As of E0 | ED gives ED, it is possible to OR values to get ED if the check gives true, and E0 otherwise. Fortunately the check for E0 complements the check for ED. So we don't need to check twice. (To be honest, I cannot formulate a proof easily despite that I inferred this trick myself and checked against the tests.) Just replacing constants, similar check can be written for F0/F4.

Simpler bit trick is ((a = *++srcptr) & 0xC0) == 0x80 to check for trailing bytes. & 0xC0 picks 2 highest bits, == 0x80 checks for 10 there. It might reduce number of branching instructions.

Some benchmark is needed to consider the tricks. Also I am not sure how it will play with future compiler optimizations/changes. So I am not going to commit them if they are not interesting/needed. Are they worth a benchmark?

@solardiz
Copy link
Member

@AlekseyCherepanov Is this ready to merge?
@magnumripper Would you like to take another look before we merge?

@AlekseyCherepanov
Copy link
Member Author

It is ready to be merged. I'd like to hear from @magnumripper about my changes to unicode.h and readability of the final code with ifs. I don't have any understanding if my change to unicode.h is valid in relation to other code backing encodings. The change is based on local context only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants