IFEval fails when multiple gpus are used (for DDP) #2266

al093 · 2024-08-30T08:03:41Z

While doing IFEval, the lib downloads NLTK tokenizers. This is an issue when multiple processes are used (for eg. in a DDP inference), because the download is done by each process. i think this leads to race conditions and causes the following issue:

      from lm_eval.tasks.ifeval import instructions_util
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 47, in <module>
      download_nltk_resources()
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 44, in download_nltk_resources
      nltk.download("punkt_tab")
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 774, in download
      for msg in self.incr_download(info_or_id, download_dir, force):
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 642, in incr_download
      yield from self._download_package(info, download_dir, force)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 733, in _download_package
      for msg in _unzip_iter(filepath, zipdir, verbose=False):
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 2250, in _unzip_iter
      zf.extractall(root)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1642, in extractall
      self._extract_member(zipinfo, path, pwd)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1692, in _extract_member
      os.mkdir(targetpath)
  FileExistsError: [Errno 17] File exists: '/home/flyte/nltk_data/tokenizers/punkt_tab/russian'

I think,

The NLTK tokenizer should not be downloaded when a module is just imported.
The download should be guarded if multiple processes are used (eg. in Distributed Data Parallel setting)

I used the main to produce this issue: (commit: 8138fd5)

The text was updated successfully, but these errors were encountered:

al093 · 2024-08-30T08:37:49Z

One workaround for this issue is to download the nltk resources in a desired safer manner before.

baberabb · 2024-08-30T11:27:04Z

Hi! Thanks for the reporting the issue! The PR should handle this. Thought the simplest way is to check for the LOCAL RANK environment variable, but open to feedback if you have any alternative suggestions.

al093 · 2024-08-30T13:32:33Z

Thanks for the PR.

I gave suggestion in the PR. I am not sure about the need of downloading nltk tokenizers when the module is imported. If possible it should be refactored.

tanliboy · 2024-09-15T16:47:50Z

I also ran into the punkt_tab problem

[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 450, in <module>
[rank1]:     cli_evaluate()
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 369, in cli_evaluate
[rank1]:     results = evaluator.simple_evaluate(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 277, in simple_evaluate
[rank1]:     results = evaluate(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 478, in evaluate
[rank1]:     metrics = task.process_results(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/api/task.py", line 1351, in process_results
[rank1]:     return self.config.process_results(doc, results)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 120, in process_results
[rank1]:     out_strict = test_instruction_following_strict(inp, response)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 43, in test_instruction_following_strict
[rank1]:     if response.strip() and instruction.check_following(response):
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/ifeval/instructions.py", line 1580, in check_following
[rank1]:     words = instructions_util.nltk.word_tokenize(value)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize
[rank1]:     sentences = [text] if preserve_line else sent_tokenize(text, language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank1]:     tokenizer = _get_punkt_tokenizer(language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank1]:     return PunktTokenizer(language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank1]:     self.load_lang(lang)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank1]:     lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/data.py", line 579, in find
[rank1]:     raise LookupError(resource_not_found)
[rank1]: LookupError:
[rank1]: **********************************************************************
[rank1]:   Resource punkt_tab not found.
[rank1]:   Please use the NLTK Downloader to obtain the resource:

[rank1]:   >>> import nltk
[rank1]:   >>> nltk.download('punkt_tab')
[rank1]:
[rank1]:   For more information see: https://www.nltk.org/data.html

[rank1]:   Attempted to load tokenizers/punkt_tab/english/

[rank1]:   Searched in:
[rank1]:     - '/home/litan/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/share/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/lib/nltk_data'
[rank1]:     - '/usr/share/nltk_data'
[rank1]:     - '/usr/local/share/nltk_data'
[rank1]:     - '/usr/lib/nltk_data'
[rank1]:     - '/usr/local/lib/nltk_data'
[rank1]: **********************************************************************

I have the data in my local folder, and I can even load the tokenizer locally

>>> from nltk import PunktTokenizer
>>> PunktTokenizer("english")
<nltk.tokenize.punkt.PunktTokenizer object at 0x7f6170003a60>

But I got the above error when I ran it with
accelerate launch -m lm_eval --model_args pretrained=<model>,dtype=bfloat16 --log_samples --output_path eval_results --tasks leaderboard --batch_size 4 --apply_chat_template --fewshot_as_multiturn

Is it related to a race condition?

ian-scale · 2024-09-19T18:48:44Z

can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me

baberabb · 2024-09-19T19:21:30Z

can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me

#2267 should fix it. As a workaround you could run python -c "import nltk; nltk.download('punkt') in your local environment before running lm_eval, and this should handle the error for the time being.

baberabb linked a pull request Aug 30, 2024 that will close this issue

Ifeval: Dowload punkt_tab on rank 0 #2267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IFEval fails when multiple gpus are used (for DDP) #2266

IFEval fails when multiple gpus are used (for DDP) #2266

al093 commented Aug 30, 2024 •

edited

Loading

al093 commented Aug 30, 2024

baberabb commented Aug 30, 2024

al093 commented Aug 30, 2024 •

edited

Loading

tanliboy commented Sep 15, 2024 •

edited

Loading

ian-scale commented Sep 19, 2024

baberabb commented Sep 19, 2024

IFEval fails when multiple gpus are used (for DDP) #2266

IFEval fails when multiple gpus are used (for DDP) #2266

Comments

al093 commented Aug 30, 2024 • edited Loading

al093 commented Aug 30, 2024

baberabb commented Aug 30, 2024

al093 commented Aug 30, 2024 • edited Loading

tanliboy commented Sep 15, 2024 • edited Loading

ian-scale commented Sep 19, 2024

baberabb commented Sep 19, 2024

al093 commented Aug 30, 2024 •

edited

Loading

al093 commented Aug 30, 2024 •

edited

Loading

tanliboy commented Sep 15, 2024 •

edited

Loading