Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error splitting the input into NAL units. #7427

Open
MengHao666 opened this issue Feb 28, 2025 · 2 comments
Open

Error splitting the input into NAL units. #7427

MengHao666 opened this issue Feb 28, 2025 · 2 comments

Comments

@MengHao666
Copy link

Describe the bug

I am trying to finetune qwen2.5-vl on 16 * 80G GPUS, and I use LLaMA-Factory and set preprocessing_num_workers=16. However, I met the following error and the program seem to got crush. It seems that the error come from datasets library

The error logging is like following:

Converting format of dataset (num_proc=16): 100%|█████████▉| 19265/19267 [11:44<00:00,  5.88 examples/s]
Converting format of dataset (num_proc=16): 100%|█████████▉| 19266/19267 [11:44<00:00,  5.02 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████| 19267/19267 [11:44<00:00,  5.44 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████| 19267/19267 [11:44<00:00, 27.34 examples/s]

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [00:00<?, ? examples/s]
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.
Invalid NAL unit size (45405 > 35540).
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (7131 > 3225).
missing picture in access unit with size 54860
Invalid NAL unit size (48042 > 33645).
missing picture in access unit with size 3229
missing picture in access unit with size 33649
Invalid NAL unit size (86720 > 54856).
Invalid NAL unit size (48042 > 33645).
Error splitting the input into NAL units.
missing picture in access unit with size 35544
Invalid NAL unit size (45405 > 35540).
Error splitting the input into NAL units.
Error splitting the input into NAL units.
Invalid NAL unit size (8187 > 7069).
missing picture in access unit with size 7073
Invalid NAL unit size (8187 > 7069).
Error splitting the input into NAL units.
Invalid NAL unit size (7131 > 3225).
Error splitting the input into NAL units.
Invalid NAL unit size (14013 > 5998).
missing picture in access unit with size 6002
Invalid NAL unit size (14013 > 5998).
Error splitting the input into NAL units.
Invalid NAL unit size (17173 > 7231).
missing picture in access unit with size 7235
Invalid NAL unit size (17173 > 7231).
Error splitting the input into NAL units.
Invalid NAL unit size (16964 > 6055).
missing picture in access unit with size 6059
Invalid NAL unit size (16964 > 6055).
Exception in thread Thread-9 (accepter)Error splitting the input into NAL units.
:
Traceback (most recent call last):
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner

Running tokenizer on dataset (num_proc=16):   0%|          | 0/19267 [13:22<?, ? examples/s]    self.run()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 953, in run
    Invalid NAL unit size (7032 > 2927).
missing picture in access unit with size 2931
self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python3.10.13/lib/python3.10/site-packages/multiprocess/managers.py", line 194, in accepter
Invalid NAL unit size (7032 > 2927).
Error splitting the input into NAL units.
    t.start()
  File "/opt/conda/envs/python3.10.13/lib/python3.10/threading.py", line 935, in start
    Invalid NAL unit size (28973 > 6121).
missing picture in access unit with size 6125
_start_new_thread(self._bootstrap, ())Invalid NAL unit size (28973 > 6121).

RuntimeError: can't start new threadError splitting the input into NAL units.

Invalid NAL unit size (4411 > 296).
missing picture in access unit with size 300
Invalid NAL unit size (4411 > 296).
Error splitting the input into NAL units.
Invalid NAL unit size (14414 > 1471).
missing picture in access unit with size 1475
Invalid NAL unit size (14414 > 1471).
Error splitting the input into NAL units.
Invalid NAL unit size (5283 > 1792).
missing picture in access unit with size 1796
Invalid NAL unit size (5283 > 1792).
Error splitting the input into NAL units.
Invalid NAL unit size (79147 > 10042).
missing picture in access unit with size 10046
Invalid NAL unit size (79147 > 10042).
Error splitting the input into NAL units.

Others

No response

Steps to reproduce the bug

None

Expected behavior

excpect to run successfully

Environment info

transformers==4.49.0
datasets==3.2.0
accelerate==1.2.1
peft==0.12.0
trl==0.9.6
tokenizers==0.21.0
gradio>=4.38.0,<=5.18.0
pandas>=2.0.0
scipy
einops
sentencepiece
tiktoken
protobuf
uvicorn
pydantic
fastapi
sse-starlette
matplotlib>=3.7.0
fire
packaging
pyyaml
numpy<2.0.0
av
librosa
tyro<0.9.0
openlm-hub
qwen-vl-utils

@lhoestq
Copy link
Member

lhoestq commented Mar 3, 2025

First time I see this error :/ maybe it's an issue with your version of multiprocess and dill ? Make sure they are compatible with datasets

@MengHao666
Copy link
Author

First time I see this error :/ maybe it's an issue with your version of multiprocess and dill ? Make sure they are compatible with datasets

any recommendation for multiprocess and dill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants