perf(python): optimize pystr deserialize perf #2007

chaokunyang · 2025-01-14T05:49:27Z

What does this PR do?

This PR implemented an optimized version of PyUnicode_FromUCS1/Fury_PyUnicode_FromUCS2 for faster performance by :

replace max char check using SIMD
Cast ucs2 array to ucs1 array by SIMD

Related issues

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
Does this PR introduce any binary protocol compatibility change?

Benchmark

chaokunyang · 2025-01-15T15:22:04Z

cc @penguin-wwy @pandalee99 @theweipeng

pandalee99

This code is very efficient,very nice!

maybe we can optimize the repetitive code.

  // Handle remaining elements
  for (; i < length; i++) {
    if (arr[i] > max_sse) {
      max_sse = arr[i];
    }

It's just the way it's written. It's nothing serious.

chaokunyang · 2025-01-15T16:34:08Z

python/pyfury/_util.pyx

            # PyUnicode_FromASCII
-            return PyUnicode_DecodeLatin1(buf, size, "strict")
+            return <unicode>Fury_PyUnicode_FromUCS1(buf, size)
+            # return PyUnicode_DecodeLatin1(buf, size, "strict")


If i use PyUnicode_DecodeLatin1 directly here, It's faster in macos, which is unexpected Since my implementation used the simd, and if i invoke PyUnicode_DecodeLatin1 directly in PyUnicode_FromUCS1, it's slower too. @penguin-wwy do you have any ideas?

Could you describe the testing method? The tests I wrote myself do not have this issue.

# integration_tests/cpython_benchmark/fury_benchmark.py STRING = "sjuveaibngurbzsivbrubiasb3r93284r92r1209130r0fa;2''j93r2nfln''[]\=-_+/,./!@$#%^&*()i9124u0hpq[jnzj0r9h034-2iu1058]" def micro_benchmark(): runner.bench_func( "fury_string", fury_object, language, not args.no_ref, STRING ) runner.bench_func( "fury_large_string", fury_object, language, not args.no_ref, STRING * 10000 )

Using PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 54.7 us +- 2.5 us
fury_large_string: Mean +- std dev: 255 us +- 24 us

Using Fury_PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 53.8 us +- 2.0 us
fury_large_string: Mean +- std dev: 236 us +- 6 us

chaokunyang added 2 commits January 13, 2025 00:33

add get uint16_t array max value util

fcb620c

add SMID copy uint16 array to uint8 array

f68dce4

chaokunyang requested a review from PragmaTwice as a code owner January 14, 2025 05:49

chaokunyang marked this pull request as draft January 14, 2025 05:49

pandalee99 self-requested a review January 14, 2025 15:26

chaokunyang added 11 commits January 15, 2025 01:01

skip avx for python wheel

eb7f7b8

enable avx for cpp test

84e0b0b

implement pyunicode library

9fd56f0

use pyunicode for python ucs1/2 string decoding

77fbec9

remove avx getMaxValue and copyValue

ec2c4d4

rename copyValue to copyArray

a0d74f1

add header and #pragma once

8e2a4b2

add cstdint include

d1d02e7

lint code

221a6f1

add #include <cassert>

4793946

remove array util inline

6f0a64b

chaokunyang force-pushed the optimize_pystr_deserialize_perf branch from 8ba4b1b to 6f0a64b Compare January 15, 2025 14:34

chaokunyang added 9 commits January 15, 2025 22:36

include <stdlib.h>

2ebfbc8

fix include

ad2f28a

add #pragma once

ea206d9

fix include

d2627fb

fix include

28aaf2c

fix include

e326271

add Python.h include

1ef388c

lint code

d4837ff

optimize include

8fe4de7

chaokunyang marked this pull request as ready for review January 15, 2025 15:19

remove comments

a940ba3

pandalee99 reviewed Jan 15, 2025

View reviewed changes

chaokunyang commented Jan 15, 2025

View reviewed changes

Merge branch 'main' into optimize_pystr_deserialize_perf

efebbfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(python): optimize pystr deserialize perf #2007

perf(python): optimize pystr deserialize perf #2007

Uh oh!

chaokunyang commented Jan 14, 2025 •

edited

Loading

Uh oh!

chaokunyang commented Jan 15, 2025

Uh oh!

pandalee99 left a comment

Uh oh!

chaokunyang Jan 15, 2025

Uh oh!

penguin-wwy Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(python): optimize pystr deserialize perf #2007

Are you sure you want to change the base?

perf(python): optimize pystr deserialize perf #2007

Uh oh!

Conversation

chaokunyang commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related issues

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

chaokunyang commented Jan 15, 2025

Uh oh!

pandalee99 left a comment

Choose a reason for hiding this comment

Uh oh!

chaokunyang Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

penguin-wwy Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chaokunyang commented Jan 14, 2025 •

edited

Loading