♿️Acceleration of existing project using CUDA #41

AbigaleD · 2025-04-07T16:58:15Z

AbigaleD
Apr 7, 2025

I’ve realized that this segmentation system was originally designed around 2019–2020, at a time when Transformer-based architectures were still emerging and NVIDIA GPUs were primarily seen as gaming hardware rather than standard ML infrastructure. The system reflects that era: it relies on LSTM and AdaBoost models, and I found no evidence of built-in GPU acceleration in the current codebase.

Given that, I believe we should first modernize the current baseline by:

Enabling GPU acceleration for existing LSTM models via cuDNN-based LSTM layers or batched inference.
Investigating whether AdaBoost inference (if implemented via CPU-bound Scikit-learn) can be accelerated or replaced with a lightweight GPU-compatible alternative.

Although I understand that the performance of segmentation here mostly refers to inference rather than training, optimizing GPU utilization during large-scale evaluation, multilingual experiments, or re-training could significantly accelerate development iterations and model tuning.

Adding optional GPU support could bring:

3–10× inference speedup for batch inputs
10–50× training speedup for large multilingual datasets
Better hardware utilization for future Transformer-based replacements

If my GSoC proposal would be accepted, I’d like to submit a pull request to modularize GPU/CPU device handling in pick_lstm_model() and benchmark the speed and memory performance of both modes.

Would that be necessary?

By the way, I would like to sincerely thank the mentors at Unicode for their patience and support. I know I've asked quite a few basic or naive questions along the way, and I truly appreciate the time and insight you've shared.I’ve been wandering through different parts of the Unicode community of GSoC, and this project feels like the best fit for my technical stack so far.

sffc · 2025-04-07T17:17:25Z

sffc
Apr 7, 2025
Maintainer

We should definitely investigate whether Transformers can be applied here.

You are correct that we are focused on the performance of inference, not training.

As far as GPUs, the model needs to be able to be run on-device on all types of hardware. This means everything from high-resource data centers to low-resource smartphones. Even on devices with a GPU available, we should be cognizant of battery use. A project primarily focused on adding CUDA support wouldn't be compelling; we already know we can do that. We'd like to make the model itself more efficient, ideally to the point that it doesn't need CUDA.

The baseline is that we have a hand-crafted trie-based dictionary data structure that is very fast but is not as accurate and uses more data. The LSTM is smaller and more accurate but uses more processing power. Any new model should be benchmarked against both the dictionary and the LSTM, in environments without GPU.

3 replies

AbigaleD Apr 7, 2025
Author

Please review my pull request of requirements.txt.

I was listing the Python environment setup, but need to know what CUDA version (if any) the original developers used.

Judging from the setup, I suspect the training might have been done on CPU — just want to double-check in case we want to enable GPU acceleration.

This is a pic of running showing opertaors segment_text.py of CPUs.(AVX2 and other stuffs...)

AbigaleD Apr 7, 2025
Author

I mean just for local convenience for developers.

AbigaleD Apr 7, 2025
Author

The above error is caused due to incompatibliy of cuda and tensorflow.
I still recommend a full documentation of environment.

AbigaleD · 2025-04-08T04:17:52Z

AbigaleD
Apr 8, 2025
Author

Hi @sffc

I have submitted my proposal for “Using AI to Better Segment Complex Scripts”
I tried to cover everything: timeline, deliverables, background, and technical ideas. The file might take a bit longer to load 😢(but please be patient it will finally load) because I added some images —they’re there to make the reading experience lighter and more visual, not just a wall of text.

I really hope it doesn’t look like total LLM spam 🙇‍♀️—I put a lot of effort into organizing it clearly.
Please feel free to email me if there is question for supportive details or any qualification proofs.

Thanks again for all your time and guidance! I’ve learned a lot from the Unicode developers, and I’m really happy to have been part of the process—even if I was not selected.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

♿️Acceleration of existing project using CUDA #41

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

♿️Acceleration of existing project using CUDA #41

Uh oh!

AbigaleD Apr 7, 2025

Replies: 2 comments · 3 replies

Uh oh!

sffc Apr 7, 2025 Maintainer

Uh oh!

Uh oh!

AbigaleD Apr 7, 2025 Author

Uh oh!

AbigaleD Apr 7, 2025 Author

Uh oh!

AbigaleD Apr 7, 2025 Author

Uh oh!

AbigaleD Apr 8, 2025 Author

AbigaleD
Apr 7, 2025

Replies: 2 comments 3 replies

sffc
Apr 7, 2025
Maintainer

AbigaleD Apr 7, 2025
Author

AbigaleD Apr 7, 2025
Author

AbigaleD Apr 7, 2025
Author

AbigaleD
Apr 8, 2025
Author