Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFEval fails when multiple gpus are used (for DDP) #2266

Open
al093 opened this issue Aug 30, 2024 · 6 comments · May be fixed by #2267
Open

IFEval fails when multiple gpus are used (for DDP) #2266

al093 opened this issue Aug 30, 2024 · 6 comments · May be fixed by #2267

Comments

@al093
Copy link

al093 commented Aug 30, 2024

While doing IFEval, the lib downloads NLTK tokenizers. This is an issue when multiple processes are used (for eg. in a DDP inference), because the download is done by each process. i think this leads to race conditions and causes the following issue:

      from lm_eval.tasks.ifeval import instructions_util
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 47, in <module>
      download_nltk_resources()
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 44, in download_nltk_resources
      nltk.download("punkt_tab")
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 774, in download
      for msg in self.incr_download(info_or_id, download_dir, force):
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 642, in incr_download
      yield from self._download_package(info, download_dir, force)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 733, in _download_package
      for msg in _unzip_iter(filepath, zipdir, verbose=False):
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 2250, in _unzip_iter
      zf.extractall(root)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1642, in extractall
      self._extract_member(zipinfo, path, pwd)
    File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1692, in _extract_member
      os.mkdir(targetpath)
  FileExistsError: [Errno 17] File exists: '/home/flyte/nltk_data/tokenizers/punkt_tab/russian'

I think,

  1. The NLTK tokenizer should not be downloaded when a module is just imported.
  2. The download should be guarded if multiple processes are used (eg. in Distributed Data Parallel setting)

I used the main to produce this issue: (commit: 8138fd5)

@al093
Copy link
Author

al093 commented Aug 30, 2024

One workaround for this issue is to download the nltk resources in a desired safer manner before.

@baberabb baberabb linked a pull request Aug 30, 2024 that will close this issue
@baberabb
Copy link
Contributor

Hi! Thanks for the reporting the issue! The PR should handle this. Thought the simplest way is to check for the LOCAL RANK environment variable, but open to feedback if you have any alternative suggestions.

@al093
Copy link
Author

al093 commented Aug 30, 2024

Thanks for the PR.

I gave suggestion in the PR. I am not sure about the need of downloading nltk tokenizers when the module is imported. If possible it should be refactored.

@tanliboy
Copy link

tanliboy commented Sep 15, 2024

I also ran into the punkt_tab problem

[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 450, in <module>
[rank1]:     cli_evaluate()
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 369, in cli_evaluate
[rank1]:     results = evaluator.simple_evaluate(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 277, in simple_evaluate
[rank1]:     results = evaluate(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 478, in evaluate
[rank1]:     metrics = task.process_results(
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/api/task.py", line 1351, in process_results
[rank1]:     return self.config.process_results(doc, results)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 120, in process_results
[rank1]:     out_strict = test_instruction_following_strict(inp, response)
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 43, in test_instruction_following_strict
[rank1]:     if response.strip() and instruction.check_following(response):
[rank1]:   File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/ifeval/instructions.py", line 1580, in check_following
[rank1]:     words = instructions_util.nltk.word_tokenize(value)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize
[rank1]:     sentences = [text] if preserve_line else sent_tokenize(text, language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank1]:     tokenizer = _get_punkt_tokenizer(language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank1]:     return PunktTokenizer(language)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank1]:     self.load_lang(lang)
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank1]:     lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank1]:   File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/data.py", line 579, in find
[rank1]:     raise LookupError(resource_not_found)
[rank1]: LookupError:
[rank1]: **********************************************************************
[rank1]:   Resource punkt_tab not found.
[rank1]:   Please use the NLTK Downloader to obtain the resource:

[rank1]:   >>> import nltk
[rank1]:   >>> nltk.download('punkt_tab')
[rank1]:
[rank1]:   For more information see: https://www.nltk.org/data.html

[rank1]:   Attempted to load tokenizers/punkt_tab/english/

[rank1]:   Searched in:
[rank1]:     - '/home/litan/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/share/nltk_data'
[rank1]:     - '/opt/conda/envs/harness/lib/nltk_data'
[rank1]:     - '/usr/share/nltk_data'
[rank1]:     - '/usr/local/share/nltk_data'
[rank1]:     - '/usr/lib/nltk_data'
[rank1]:     - '/usr/local/lib/nltk_data'
[rank1]: **********************************************************************

I have the data in my local folder, and I can even load the tokenizer locally

>>> from nltk import PunktTokenizer
>>> PunktTokenizer("english")
<nltk.tokenize.punkt.PunktTokenizer object at 0x7f6170003a60>

But I got the above error when I ran it with
accelerate launch -m lm_eval --model_args pretrained=<model>,dtype=bfloat16 --log_samples --output_path eval_results --tasks leaderboard --batch_size 4 --apply_chat_template --fewshot_as_multiturn

Is it related to a race condition?

@ian-scale
Copy link

can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me

@baberabb
Copy link
Contributor

can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me

#2267 should fix it. As a workaround you could run python -c "import nltk; nltk.download('punkt') in your local environment before running lm_eval, and this should handle the error for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants