Support tuples for `[r]find()` & `[r]index()` #2

nineteendo · 2024-06-03T12:03:18Z

Motivation

For finding multiple substrings, there's currently no single algorithm that outperforms others in most cases. Their
performance varies significantly between the best-case and worst-case, making it difficult to choose one:

algorithm	loop in	loop startswith	re¹	find str	unit
find chars best case	1.00	1.19	1.56	1.37	x 180 nsec
find chars mixed case	1.00	1.22	1.54	91.01	x 178 nsec
find chars worst case	1262.20	1597.56	131.71	1.00	x 32.80 usec
find subs best case		1.00	1.33	1.17	x 212 nsec
find subs mixed case		1.00	1.30	3327.27	x 220 nsec
find subs worst case		35.82	3.62	1.00	x 1.46 msec
find many prefixes		1760.80	1.00	122.26	x 301.00 usec
find many infixes		122.17	1.00	7.71	x 4.33 msec
rfind chars best case	1.00	2.61	4.34	1.96	x 114 nsec
rfind chars mixed case	1.00	2.66	4.34	4561.40	x 114 nsec
rfind chars worst case	50.10	55.25	6.94	1.00	x 1.01 msec
rfind subs best case		1.58	2.55	1.00	x 229 nsec
rfind subs mixed case		1.00	1.59	1921.05	x 380 nsec
rfind subs worst case		38.07	4.97	1.00	x 1.45 msec
rfind many suffixes		1094.46	1.00	50.51	x 487.00 usec
rfind many infixes		54.70	1.00	2.41	x 9.69 msec

That's why I'm suggesting a dynamic algorithm that doesn't suffer from these problems:

algorithm	loop in	loop startswith	re¹	find str	find tuple	unit
find chars best case	2.67	3.19	4.15	3.64	1.00	x 68 nsec
find chars mixed case	2.29	2.80	3.53	208.23	1.00	x 78 nsec
find chars worst case	1489.21	1884.89	155.40	1.18	1.00	x 27.80 usec
find subs best case		3.07	4.07	3.59	1.00	x 69 nsec
find subs mixed case		2.32	3.02	7713.38	1.00	x 95 nsec
find subs worst case		35.82	3.62	1.00	1.01	x 1.46 msec
find many prefixes		1760.80	1.00	122.26	82.06	x 301.00 usec
find many infixes		122.17	1.00	7.71	5.22	x 4.33 msec
rfind chars best case	1.33	3.45	5.76	2.59	1.00	x 86 nsec
rfind chars mixed case	1.08	2.86	4.67	4905.66	1.00	x 106 nsec
rfind chars worst case	50.10	55.25	6.94	1.00	1.10²	x 1.01 msec
rfind subs best case		3.70	5.98	2.34	1.00	x 98 nsec
rfind subs mixed case		3.42	5.43	6576.58	1.00	x 111 nsec
rfind subs worst case		38.07	4.97	1.00	1.04³	x 1.45 msec
rfind many suffixes		1094.46	1.00	50.51	50.31	x 487.00 usec
rfind many infixes		54.70	1.00	2.41	2.39	x 9.69 msec

algorithms

# find_tuple.py
def find0(string, chars):
    for i, char in enumerate(string):
        if char in chars:
            break
    else:
        i = -1
    return i

def find1(string, subs):
    for i in range(len(string)):
        if string.startswith(subs, i):
            break
    else:
        i = -1
    return i

def find2(string, pattern):
    match = pattern.search(string)
    i = match.start() if match else -1
    return i

def find3(string, subs):
    i = -1
    for sub in subs:
        new_i = string.find(sub, 0, None if i == -1 else i + len(sub))
        if new_i != -1:
            i = new_i
    return i

def find4(string, subs):
    i = string.find(subs)
    return i

def rfind0(string, chars):
    i = len(string) - 1
    while i >= 0 and string[i] not in chars:
        i -= 1
    return i

def rfind1(string, subs):
    for i in range(len(string), -1, -1):
        if string.startswith(subs, i):
            break
    else:
        i = -1
    return i

rfind2 = find2

def rfind3(string, subs):
    i = -1
    for sub in subs:
        new_i = string.rfind(sub, 0 if i == -1 else i)
        if new_i != -1:
            i = new_i
    return i

def rfind4(string, subs):
    i = string.rfind(subs)
    return i

benchmark script

# find_tuple.sh
echo find chars best case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'ab' + '_' * 999_998; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find chars mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'b' + '_' * 999_999; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find chars worst case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find subs best case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'abcd' + '_' * 999_996; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find subs mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'cd' + '_' * 999_998; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find subs worst case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find many prefixes
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('|'.join(f'prefix{i}' for i in range(100)))" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find4(string, subs)"
echo find many infixes
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('|'.join(f'{i}infix{i}' for i in range(100)))" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find4(string, subs)"

echo ---

echo rfind chars best case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_998 + 'ba'; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind chars mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_999 + 'b'; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind chars worst case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind subs best case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_996 + 'cdab'; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind subs mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_998 + 'cd'; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind subs worst case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind many suffixes
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile(f'(?r){'|'.join(f'{i}suffix' for i in range(100))}')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind4(string, subs)"
echo rfind many infixes
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile(f'(?r){'|'.join(f'{i}infix{i}' for i in range(100))}')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind4(string, subs)"

Examples

>>> "0123456789".find(("a", "b", "c"))
-1
>>> "0123456789".find(("0", "1", "2"))
0
>>> "0123456789".rfind(("7", "8", "9"))
9

Use cases

While I haven't found a lot of use cases, this new addition would improve the readability and performance for all of
them:

cpython/Lib/ntpath.py

Lines 238 to 240 in 73906d5

    
           i = len(p) 
        
           while i and p[i-1] not in seps: 
        
               i -= 1

cpython/Lib/ntpath.py

Lines 378 to 380 in f90ff03

    
           i, n = 1, len(path) 
        
           while i < n and path[i] not in seps: 
        
               i += 1

cpython/Lib/genericpath.py

Lines 164 to 167 in 0d42ac9

    
           sepIndex = p.rfind(sep) 
        
           if altsep: 
        
               altsepIndex = p.rfind(altsep) 
        
               sepIndex = max(sepIndex, altsepIndex)

>2K files with /max\(\w+\.rfind/

Python implementation

The implementation written in Python is clear and simple (in C overflow and exceptions need to be handled manually):

MIN_CHUNK_SIZE = 32
MAX_CHUNK_SIZE = 16384
EXP_CHUNK_SIZE = 2

def find_tuple(string, subs, start=0, end=None):
    end = len(string) if end is None else end
    result = -1
    chunk_size = MIN_CHUNK_SIZE
    chunk_start = start
    while True:
        chunk_end = min(chunk_start + chunk_size, end)
        if chunk_end < end:
            chunk_end -= 1
        for sub in subs:
            sub_end = min(chunk_end + len(sub), end)
            new_result = string.find(sub, chunk_start, sub_end)
            if new_result != -1:
                if new_result == chunk_start:
                    return new_result
                chunk_end = new_result - 1 # Only allow earlier match
                result = new_result
        if result != -1 or chunk_end >= end:
            return result # Found match or searched entire range
        chunk_start = chunk_end + 1
        chunk_size = min(chunk_size * EXP_CHUNK_SIZE, MAX_CHUNK_SIZE)

def rfind_tuple(string, subs, start=0, end=None):
    end = len(string) if end is None else end
    result = -1
    chunk_size = MIN_CHUNK_SIZE
    chunk_end = end
    while True:
        chunk_start = max(start, chunk_end - chunk_size)
        if chunk_start > start:
            chunk_start += 1
        for sub in subs:
            sub_end = min(chunk_end + len(sub), end)
            new_result = string.rfind(sub, chunk_start, sub_end)
            if new_result != -1:
                if new_result == chunk_end:
                    return new_result
                chunk_start = new_result + 1 # Only allow later match
                result = new_result
        if result != -1 or chunk_start <= start:
            return result # Found match or searched entire range
        chunk_end = chunk_start - 1
        chunk_size = min(chunk_size * EXP_CHUNK_SIZE, MAX_CHUNK_SIZE)

Explanation

The search is split up in chunks, overlapping by the length of a substring. After the first match, we search for the
next substring in the part before (or after for reverse search). Each chunk will be twice as large as the previous
one, but capped at 16384. The dynamic size ensures good best- and worst-case performance.

C call hierarchy

find_sub() and find_subs() are called based on the argument type using an inline function. find_sub() is called
for tuples of length 1, chunk_find_sub() is called for tuples with more than 1 element:

graph TD;
    unicode_find_impl-.->find;
    unicode_index_impl-.->find;
    unicode_rfind_impl-.->find;
    unicode_rindex_impl-.->find;
    find-->find_sub;
    find-->find_subs;
    find_subs-->find_sub;
    find_subs-->chunk_find_sub;
    find_sub-->fast_find_sub;
    chunk_find_sub-->fast_find_sub;

Calibration

MIN_CHUNK_SIZE and MAX_CHUNK_SIZE were calibrated on this benchmark:

32 was the highest mimimum size beating out all other algorithms in the best-case, setting it any lower would hurt
performance for substrings after the first chunk.
16384 was the highest maximum size with a measurable improvement in the worst case, setting it any higher
would only hurt performance in the average case.

MIN_CHUNK_SIZE	4	8	16	32	64	128	256	512	1024	unit
find chars best case	1.01	1.00	1.07	1.09	1.06	1.10	1.06	1.10	1.10	x 66.7 nsec
find chars mixed case	1.00	1.01	1.10	1.14	1.10	1.17	1.09	1.23	1.24	x 75.3 nsec
find subs best case	1.00	1.01	1.02	1.05	1.00	1.04	1.01	1.01	1.01	x 70.9 nsec
find subs mixed case	1.00	1.01	1.06	1.22	1.39	2.22	3.31	5.51	10.0	x 84.1 nsec
rfind chars best case	1.02	1.01	1.02	1.00	1.01	1.00	1.00	1.03	1.02	x 92.9 nsec
rfind chars mixed case	1.00	1.01	1.06	1.12	1.29	1.68	2.34	3.69	6.12	x 98.8 nsec
rfind subs best case	1.00	1.01	1.02	1.01	1.03	1.00	1.00	1.02	1.02	x 96.9 nsec
rfind subs mixed case	1.00	1.00	1.06	1.08	1.29	1.89	2.75	4.54	8.05	x 106 nsec

MAX_CHUNK_SIZE	1024	2048	4096	8192	16384	32768	65536	unit
find chars worst case	1.63	1.27	1.11	1.09	1.01	1.00	1.05	x 27.7 usec
find subs worst case	1.03	1.01	1.00	1.00	1.00	1.03	1.00	x 1.47 msec
find many prefixes	1.08	1.04	1.02	1.00	1.00	1.47	1.47	x 24.7 msec
find many infixes	1.08	1.03	1.01	1.00	1.00	1.45	1.44	x 22.6 msec
rfind chars worst case	1.20	1.00	1.17	1.17	1.01	1.01	1.15	x 1.03 msec
rfind subs worst case	1.04	1.02	1.01	1.01	1.01	1.01	1.00	x 1.44 msec
rfind many suffixes	1.09	1.04	1.02	1.01	1.00	1.00	1.00	x 24.4 msec

Previous discussion

Using the regex module for reverse search ↩ ↩²
memrchr() is not available on macOS ↩
expected as find tuple does more work in the worst case ↩

Co-authored-by: d.grigonis <[email protected]>

nineteendo · 2024-06-27T14:11:32Z

We cooperate further to make this fit into a wider picture so that the changes made are sensible from all aspects.

We can still do that afterwards, I first need to try to gain traction on Discourse (assuming Discord was a typo).

send me a a complete post

Title: Support tuples for [r]find() & [r]index()
Tags: builtins, performance
Text: Support tuples for [r]find() & [r]index() #2 (comment)
Group: Ideas
Should it be a new thread or comment in existing one? A previous new thread was merged, but this post is much longer, so I don't think it makes sense in the existing thread.

By the way, summary of this PR is rendering as raw markdown to me at the moment.

That's intentional, otherwise you can't copy it. It actually looks like this:

````md
...
````

nineteendo · 2024-06-27T18:42:46Z

@dg-pb, are you running into problems?

dg-pb · 2024-06-27T21:39:59Z

You can not choose both it is either either.

You on your own doing your own thing regardless of what I say or you start listening and concentrating on what needs to be done.

There is nothing wrong if you know what to do and you have your own plan, but in this case I don't see how I can be of any help.

nineteendo · 2024-06-28T06:26:12Z

I would like to say you have been an invaluable help. I don't think this proposal would have been possible without your insight, especially the chunking. But I don't want to think about this any further as I don't see how this can be done, sorry. It feels like we're just wasting out time thinking about an unproven idea (like serhiy's).

If you still have other feedback, I'm of course willing to listen to you. e.g. if you want to expand the text from the PR summary.
Do you think we can encorporate your previous summary somehow? I'm fine with waiting to make it as detailed as possible.

dg-pb · 2024-06-28T09:08:00Z

My current feedback is as follows:

This PR explored the idea well.

If I was in your place I would take a step back and create new PR to start clean.

And see how to best incorporate this into the current architecture keeping in mind the whole set of string methods (as listed in https://discuss.python.org/t/string-search-overview-coverage-and-api/56711/7) and the fact that this algorithm will/should one day be replaced with more optimal version - thus it should be integrated for an easy swap.

Current integration has a flavour of "lets implement this in the easiest and fastest way" as apart to "lets implement this so that it is optimal having all things considered".

I understand this can be difficult (it is always so to me when I aim higher in relation to my current knowledge and experience) and might take some time. But to me what is important is the attitude, how long it takes is secondary.

If our attitudes differ, then it's no big deal. Maybe our situation in life differs - we have different goals, amount of time at hand, experience etc... In this case it is not of benefit to either of us to cooperate as we will only get into each other's way.

dg-pb · 2024-06-28T09:25:15Z

Do you think we can encorporate your previous summary somehow?

You are free to use whatever material I have written in relation to this idea as long as the facts in it are not outdated and are still relevant. I think the most accurate, up-to-date evaluation of this from my side is at https://discuss.python.org/t/string-search-overview-coverage-and-api/56711/7. It is short simple, straight to the point and is well put into the wider context. But as I said, you can use any material I have written on this as long as it is still relevant to the current state of this.

dg-pb · 2024-06-28T09:37:17Z

I don't think it is good time to be gaining traction after there has already been a lot of it and there were no significant changes in this PR since.

I intend to keep my word and make 1 post on your behalf if you choose so.

However, I am not in agreement that either issuing new PR or gaining traction on discourse is a step in the right direction at this time.

While there are others who think differently (such as @erlend-aasland or @pfmoore as indicated by you).

So maybe it would make more sense for you to ask them?

erlend-aasland · 2024-06-28T09:46:17Z

If I was in your place I would take a step back and create new PR to start clean.

No; please do not use the CPython repo for your own personal experiments! We already have 1.5k open PRs. Create the experimental PR on your own fork.

I intend to keep my word and make 1 post on your behalf if you choose so.

Why are you helping circumvent the ban? Please don't.

pfmoore · 2024-06-28T09:57:43Z

Why are you helping circumvent the ban? Please don't.

Agreed.

Can someone else post a message for me, or do I need to wait until January?

@nineteendo if you have been given a ban from Discourse, then the idea is that you spend that time reflecting on why you were banned, and come back with a better understanding of how to interact on the forum.

Having the same style of conversation here, and "waiting until January" to simply go back to Discourse with no change in behavior will simply result in you getting another ban.

Uh oh, you pinged a random person.

Hardly random - avoiding the people who have tried to give you advice will not help you improve how you interact with the community 🙁

No; please do not use the CPython repo for your own personal experiments!

Exactly this.

Paul Moore had said I could just submit a PR

I assumed you could (and would) develop something that was ready for review. Not that you start another long discussion on the tracker. Please don't mischaracterise what I said on Discourse as support for what you're doing here.

dg-pb · 2024-06-28T09:58:58Z

No; please do not use the CPython repo for your own personal experiments! We already have 1.5k open PRs. Create the experimental PR on your own fork.

This is what I meant.

dg-pb · 2024-06-28T10:08:45Z

Hardly random - avoiding the people who have tried to give you advice will not help you improve how you interact with the community 🙁

No, I actually pinned another Paul Moore accidentally at first. :)

nineteendo · 2024-06-28T10:38:00Z

If I was in your place I would take a step back and create new PR to start clean.
see how to best incorporate this into the current architecture keeping in mind the whole set of string methods

Making the necessary adjustments on a new branch would also work, most likely like this:

stringlib_[r]find_subs()
asciilib_[r]find_subs()
usc1lib_[r]find_subs()
usc2lib_[r]find_subs()
usc4lib_[r]find_subs()

But the problem is that these methods either require a tuple (which requires separate handling for strings and bytes) or an array of SUB structs (which need to be allocated on the slow heap). So, the current approach seems like the best option.

Current integration has a flavour of "lets implement this in the easiest and fastest way"

I've tried a lot of different things in this pull request, the current implementation is simply the most performant.

I intend to keep my word and make 1 post on your behalf if you choose so.

Just post a link in the existing thread, it was never my intention to circumvent the ban, sorry. I've done that on a different forum in the past and became a moderator afterwards, but the attitude is vastly different here, so that seems like a very bad idea. Their patience with me is gone. I only realised recently this might been seen as ban evasion.

the idea is that you spend that time reflecting on why you were banned

I was banned for creating this thread, which is an "idea" to improve Discourse, while I was asked to not post ANY new ideas until 2025. I've already created drafts for then: this one, and #3, which I hope are more fleshed out. I will ask in September if I still need to wait until then as the only thing left to do is write a PEP and I would like to get some feedback first.

Please don't mischaracterise what I said on Discourse as support for what you're doing here.

Eh, this is my own repository, I don't think it's a problem to have a discussion here. Or did you mean "there"?

No, I actually pinned another Paul Moore accidentally at first. :)

Which is why you shouldn't remove that from your message, it makes the conversation hard to follow for new people.

dg-pb · 2024-06-28T11:08:03Z

But the problem is

Yes, there are difficult problems to be solved for a nice integration of this.

The fact that there is an emphasis on problems and why it can NOT be done better as opposed to solutions is the main reason why I don't want to be part of this anymore.

pfmoore · 2024-06-28T11:15:04Z

Eh, this is my own repository, I don't think it's a problem to have a discussion here. Or did you mean "there"?

Sorry, I didn't spot this was your own repo (I get a lot of notifications). Pinging me (and Erland) on a private development discussion was probably the mistake here.

I'll unsubscribe from this discussion, as I'm not interested in being involved.

erlend-aasland · 2024-06-28T11:23:42Z

@nineteendo, please do not edit my posts, even on your own repo. I'm unsubscribing.

nineteendo · 2024-06-28T11:51:00Z

The fact that there is an emphasis on problems and why it can NOT be done better as opposed to solutions is the main reason why I don't want to be part of this anymore.

I'm trying to find a solution, as you still want to improve this, but so far I haven't found anything yet.

The functions in strlib are defined using STRINGLIB, so we would need to use STRINGLIB(find_subs). We need to somehow pass the substrings, either as a tuple or on the heap, you can't get around that:

A tuple requires separate handling for strings and bytes, maybe using a macro? If we define separate methods, we could just as well keep the implementation where it is now.
Using the heap was slow in my previous attempt, should I try again with a single alloc? I don't have too high hopes.

I suggest you look into it before deciding whether I need to pursue it. At some point you must give up when something is not feasible.

nineteendo · 2024-06-28T15:41:23Z

I don't think it is good time to be gaining traction after there has already been a lot of it and there were no significant changes in this PR since.

I actually think there have been significant changes since the last benchmark posted there:

We're now comparing against the regex module, which has proper support for reverse search
The chunk size is now dynamic, so it's the fastest in 10/16 of the cases instead of 5/16 (with a proper comparison)
The code is a lot simpler now
I've written a much more detailed summary

I would like a proper review, so this can be accepted or rejected. Otherwise, I'll keep thinking about it. (this holds for my other PRs as well, but I'm more pasionate about this one)

So, could you please link to this PR, stating there have been significant changes in the last month? Or just tell me you won't? I'll leave you alone afterwards (also if you refuse). ~~I'll lock this conversation after you've made your decision.~~

dg-pb · 2024-06-28T16:58:39Z

I posted a link to this PR here: https://discuss.python.org/t/add-tuple-support-to-more-str-functions/50628/133

It has been mentioned twice in a short period of time. Once in the main thread and once in a comment.

nineteendo · 2024-06-28T17:08:28Z

Thanks again for all the help, I couldn't have done it without you. I've leave this open in case someone wants to talk to me directly.

dg-pb · 2024-06-28T18:53:18Z

I'm trying to find a solution, as you still want to improve this, but so far I haven't found anything yet.

I don't have anything definite yet either, but I know that this can be done and probably should be (for this PR to have a fair chance). I had initial look and a rough view how it could be is as follows:

1. Implement `find_horspool_many` in `fastsearch.h`

It should use horspool_find and find_char under the hood.
gh-119702: New dynamic algorithm selection for string search (+ rfind alignment) python/cpython#120025 eliminates many function calls so it will be easy to make use of it - it is bi-directional and handles most of cases.
It should do not only find, but count as well. Essentially, all the same features as horspool_find.

This way, it will be easy to swap it with other solution when/if someone decides to implement something else. This is also a big sell for the PR because the biggest criticism is that this is not theoretically optimal solution. This provides a good answer to that: "yes, but it introduces an architecture where more optimal solution is easy to drop-in and provides a very good and practical temporary solution which is most efficient for 95% (or whatever) of use cases".

2. Implement `FASTSEARCH_MANY` in `fastsearch.h`

This has the same purpose as FASTSEARCH, but for many substrings. I.e. handles special cases and eventually calls find_horspool_many.
Some special cases will need to be handled separately (same as in FASTSEARCH). However, that is good, because there are cases where clever things can be applied. Such as:
3.1. if all substrings are 1-character strings, then similar approach to split_whitespace can be used which will be much faster.
2.2. Preparation step can only be done once (either done here or in horspool_find_many.

3. Figure out how to plug it in.

This will be the trickiest part how to do nicely and with minimal changes. At least to me because I have not done any work in these parts. Maybe this would be easier for you because you did?

Solution from my initial look lies somewhere in between unicodeobject.c:any_find_slice and find.h:methods. But to figure this out I would need to take 1-2 days to digest properly.

My plan

Before python#120025 is resolved, I am giving strings a break.

Because if it is implemented it does simplify a fair bit so it is more productive to give it a break.

Also, I want to work on something else for a while - I had my fair dose of strings in the last month or 2.

After I come back (when python#120025 is resolved), these are the things I will look at (most likely in this exact same order):

Adding maxcount argument to find
Implementing this PR
Adding keepsep argument to split
Implementing split(tuple)

For you

This is a starting point and things to think about if you want to work on this.

The way I see it the most productive use of your time would be to implement "mock" FASTSEARCH_MANY and work out how to plug it in.

The actual implementation of FASTSEARCH_MANY and horspool_find_many will be easy once this is done, because I have worked on fastsearch.h and know every single thing there while the actual algorithm of this PR is in a good shape.

Otherwise, you can concentrate on something else and come back to this once python#120025 is resolved and we can figure this out together then.

If you wish, we can continue on this path, but in this case there is 1 condition: no more posting in discourse and no more public PRs until the above is complete and in presentable state.

nineteendo · 2024-06-28T21:16:39Z

Let's try with tuples, that seems like the easiest. Looks like chunk_find_sub() needs to be implemented for strings and bytes separately:

static int
PARSE_SUB_OBJ(PyObject **subobj, void **sub, int *sub_kind, Py_ssize_t *sub_len)
{
    *sub = PyUnicode_DATA(*subobj);
    *sub_kind = PyUnicode_KIND(*subobj);
    *sub_len = PyUnicode_GET_LENGTH(*subobj);
    return 0;
}

static Py_ssize_t
chunk_find_sub(const void *str, Py_ssize_t len,
               PyObject* subobj,
               Py_ssize_t chunk_start, Py_ssize_t chunk_end,
               Py_ssize_t end, int direction)
{
    int sub_kind;
    void *sub;
    Py_ssize_t sub_len, result;

    assert(chunk_end <= end);
    if (PARSE_SUB_OBJ(&subobj, &sub, &sub_kind, &sub_len) {
        return -1;
    }

    if (sub_kind > kind) {
        return -1;
    }

    if (chunk_end >= end - len2) { // Guard overflow
        result = fast_find_sub(str, len, sub, sub_kind, sub_len, chunk_start,
                               end, direction);
    }
    else {
        result = fast_find_sub(str, len, sub, sub_kind, sub_len, chunk_start,
                               chunk_end + sub_len, direction);
    }

    if (subobj) {
        PyBuffer_Release(&subbuf); // Only needed for bytes
    }

    return result;
}

dg-pb · 2024-07-03T16:22:15Z

Think of it this way:

Current single-sub search looks as follows:

find_impl       _Py_bytes_find
    |                | 
any_find_slice    find_internal
       \             /
       find.h:functions
              |
          fastfind.c:FASTSEARCH
              |
          fastfind:c:search_algorithms

To integrate into the same architecture would mean something along the lines:

     unicodeobject.c         |  bytes_methods.c
                             |
       find_impl             |  _Py_bytes_find
       |       |             |     |        |
       | any_find_slice_multi|find_internal |
any_find_slice              \|/       find_internal_multi
#----------------------------+------------------------------
              \             / \          /
               \           /   \        /
                \         /     \      /
#-----------------------------------------------------------
find.h           functions   functions_multi
                     |              |
#-----------------------------------------------------------
fastfind.c      FASTSEARCH    FASTSEARCHMULTI
                         \     /
                     search_algorithms

chunk_find_sub would be placed in fastfind.c and named horspool_find_multi (or something similar) and is part of fastfind:c:search_algorithms and would be common for both bytes and strings.

This would be a good initial implementation.

dg-pb · 2024-07-03T16:24:13Z

This would be the first step and next steps (improvements/ simplifications/special cases) would become evident in the process.

nineteendo · 2024-07-03T16:41:54Z

Could you give mermaid diagrams a shot?

```mermaid
graph TD;
    root-->child1;
    root-->child2;
    child1-->grandchild1;
    child1-->grandchild2;
    child2-->grandchild2;
```

graph TD;
    root-->child1;
    root-->child2;
    child1-->grandchild1;
    child1-->grandchild2;
    child2-->grandchild2;

dg-pb · 2024-07-03T16:56:09Z

graph TD;
    subgraph unicodeobject.c
        find_impl-->any_find_slice
        find_impl-->any_find_slice_multi
    end
    subgraph bytes_methods.c
        _Py_bytes_find-->find_internal
        _Py_bytes_find-->find_internal_multi
    end
    subgraph find.h
        any_find_slice-->functions
        find_internal-->functions
        any_find_slice_multi-->functions_multi
        find_internal_multi-->functions_multi
    end
        subgraph fastfind.c
        functions--> FASTSEARCHMULTI
        functions_multi-->FASTSEARCH
        FASTSEARCH-->search_algorithms
        FASTSEARCHMULTI-->search_algorithms
    end

nineteendo and others added 30 commits May 24, 2024 11:00

Support tuples for find & rfind

3632624

Update docs

e39b040

Add tests

cb905bc

📜🤖 Added by blurb_it.

1807fd8

Apply suggestions from code review

cca08fa

Apply suggestions from code review

302faa3

Fix signature tests

cb95578

Short circuit

a35d3ae

Fix start for rfind

65c0a9e

Refactor checks

5cbb1f0

Fix end for rfind

00b2b04

Adjust indices

e124603

Micro optimisation

41b0cd8

Fix conversion

7b83a22

Fix condition

c905458

Add tests

5c79f24

Co-authored-by: d.grigonis <[email protected]>

Clarify documentation

148b471

Co-authored-by: d.grigonis <[email protected]>

Add constant

351dc83

Duplicate constant

ddaf4b4

Add tests

2b044a1

Remove newline

a632f25

Update Lib/test/string_tests.py

ef28dab

Co-authored-by: d.grigonis <[email protected]>

Update Lib/test/string_tests.py

4207d54

Update Lib/test/string_tests.py

0dff482

Co-authored-by: d.grigonis <[email protected]>

Update Lib/test/string_tests.py

fc0d9ea

Don't check twice on boundary

cd317fd

Apply suggestions from code review

43e8259

Apply suggestions from code review

2524dc1

Test bytes

dbc8c94

Add more bytes tests

49a28a0

nineteendo changed the title ~~gh-118184: Support tuples for find, index, rfind & rindex~~ gh-118184: Support tuples for find(), index(), rfind() & rindex() Jun 28, 2024

nineteendo changed the title ~~gh-118184: Support tuples for find(), index(), rfind() & rindex()~~ Support tuples for [r]find() & [r]index() Jun 28, 2024

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tuples for `[r]find()` & `[r]index()` #2

Support tuples for `[r]find()` & `[r]index()` #2

nineteendo commented Jun 3, 2024 •

edited

Loading

nineteendo commented Jun 27, 2024 •

edited

Loading

nineteendo commented Jun 27, 2024

dg-pb commented Jun 27, 2024

nineteendo commented Jun 28, 2024 •

edited

Loading

This comment was marked as resolved.

dg-pb commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024

dg-pb commented Jun 28, 2024 •

edited

Loading

This comment was marked as resolved.

erlend-aasland commented Jun 28, 2024 •

edited

Loading

pfmoore commented Jun 28, 2024 •

edited by nineteendo

Loading

dg-pb commented Jun 28, 2024

This comment was marked as resolved.

dg-pb commented Jun 28, 2024

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

pfmoore commented Jun 28, 2024

erlend-aasland commented Jun 28, 2024

nineteendo commented Jun 28, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jul 3, 2024 •

edited

Loading

dg-pb commented Jul 3, 2024 •

edited

Loading

nineteendo commented Jul 3, 2024

dg-pb commented Jul 3, 2024 •

edited

Loading

	i, n = 1, len(path)
	while i < n and path[i] not in seps:
	i += 1

	sepIndex = p.rfind(sep)
	if altsep:
	altsepIndex = p.rfind(altsep)
	sepIndex = max(sepIndex, altsepIndex)

Support tuples for [r]find() & [r]index() #2

Are you sure you want to change the base?

Support tuples for [r]find() & [r]index() #2

Conversation

nineteendo commented Jun 3, 2024 • edited Loading

Motivation

Examples

Use cases

Python implementation

Explanation

C call hierarchy

Calibration

Previous discussion

Footnotes

nineteendo commented Jun 27, 2024 • edited Loading

nineteendo commented Jun 27, 2024

dg-pb commented Jun 27, 2024

nineteendo commented Jun 28, 2024 • edited Loading

This comment was marked as resolved.

dg-pb commented Jun 28, 2024 • edited Loading

dg-pb commented Jun 28, 2024

dg-pb commented Jun 28, 2024 • edited Loading

This comment was marked as resolved.

erlend-aasland commented Jun 28, 2024 • edited Loading

pfmoore commented Jun 28, 2024 • edited by nineteendo Loading

dg-pb commented Jun 28, 2024

This comment was marked as resolved.

dg-pb commented Jun 28, 2024

nineteendo commented Jun 28, 2024 • edited Loading

dg-pb commented Jun 28, 2024 • edited Loading

pfmoore commented Jun 28, 2024

erlend-aasland commented Jun 28, 2024

nineteendo commented Jun 28, 2024 • edited Loading

nineteendo commented Jun 28, 2024 • edited Loading

dg-pb commented Jun 28, 2024

nineteendo commented Jun 28, 2024 • edited Loading

dg-pb commented Jun 28, 2024 • edited Loading

1. Implement find_horspool_many in fastsearch.h

2. Implement FASTSEARCH_MANY in fastsearch.h

3. Figure out how to plug it in.

My plan

For you

nineteendo commented Jun 28, 2024 • edited Loading

dg-pb commented Jul 3, 2024 • edited Loading

dg-pb commented Jul 3, 2024 • edited Loading

nineteendo commented Jul 3, 2024

dg-pb commented Jul 3, 2024 • edited Loading

Support tuples for `[r]find()` & `[r]index()` #2

Support tuples for `[r]find()` & `[r]index()` #2

nineteendo commented Jun 3, 2024 •

edited

Loading

nineteendo commented Jun 27, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

erlend-aasland commented Jun 28, 2024 •

edited

Loading

pfmoore commented Jun 28, 2024 •

edited by nineteendo

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jun 28, 2024 •

edited

Loading

1. Implement `find_horspool_many` in `fastsearch.h`

2. Implement `FASTSEARCH_MANY` in `fastsearch.h`

nineteendo commented Jun 28, 2024 •

edited

Loading

dg-pb commented Jul 3, 2024 •

edited

Loading

dg-pb commented Jul 3, 2024 •

edited

Loading

dg-pb commented Jul 3, 2024 •

edited

Loading