Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Jan 14, 2025

What does this PR do?

This PR implemented an optimized version of PyUnicode_FromUCS1/Fury_PyUnicode_FromUCS2 for faster performance by :

  • replace max char check using SIMD
  • Cast ucs2 array to ucs1 array by SIMD

Related issues

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@chaokunyang chaokunyang marked this pull request as draft January 14, 2025 05:49
@pandalee99 pandalee99 self-requested a review January 14, 2025 15:26
@chaokunyang chaokunyang force-pushed the optimize_pystr_deserialize_perf branch from 8ba4b1b to 6f0a64b Compare January 15, 2025 14:34
@chaokunyang chaokunyang marked this pull request as ready for review January 15, 2025 15:19
@chaokunyang
Copy link
Collaborator Author

Copy link
Contributor

@pandalee99 pandalee99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is very efficient,very nice!

maybe we can optimize the repetitive code.

  // Handle remaining elements
  for (; i < length; i++) {
    if (arr[i] > max_sse) {
      max_sse = arr[i];
    }

It's just the way it's written. It's nothing serious.

# PyUnicode_FromASCII
return PyUnicode_DecodeLatin1(buf, size, "strict")
return <unicode>Fury_PyUnicode_FromUCS1(buf, size)
# return PyUnicode_DecodeLatin1(buf, size, "strict")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i use PyUnicode_DecodeLatin1 directly here, It's faster in macos, which is unexpected Since my implementation used the simd, and if i invoke PyUnicode_DecodeLatin1 directly in PyUnicode_FromUCS1, it's slower too. @penguin-wwy do you have any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe the testing method? The tests I wrote myself do not have this issue.

# integration_tests/cpython_benchmark/fury_benchmark.py
STRING = "sjuveaibngurbzsivbrubiasb3r93284r92r1209130r0fa;2''j93r2nfln''[]\=-_+/,./!@$#%^&*()i9124u0hpq[jnzj0r9h034-2iu1058]"

def micro_benchmark():
    runner.bench_func(
        "fury_string", fury_object, language, not args.no_ref, STRING
    )
    runner.bench_func(
        "fury_large_string", fury_object, language, not args.no_ref, STRING * 10000
    )

Using PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 54.7 us +- 2.5 us
fury_large_string: Mean +- std dev: 255 us +- 24 us

Using Fury_PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 53.8 us +- 2.0 us
fury_large_string: Mean +- std dev: 236 us +- 6 us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants