Open
Conversation
## Summary - add `eval_sliding_lm.py` for evaluation of cached single-turn chat datasets - document how to use this evaluation script ## Testing - `ruff check src/levanter/main/eval_sliding_lm.py` - `pytest -k none` *(fails: Could not install dependencies / huggingface network blocked)* ------ https://chatgpt.com/codex/tasks/task_e_6858e07113ec8327bc506e21be5f5efc
## Summary - allow specifying HF dataset IDs in evaluation scripts by using `SingleDatasetLMConfig` - update `eval_lm.py`, `eval_sliding_lm.py`, `viz_logprobs.py`, and `lora_lm.py` ## Testing - `black` formatting - `isort` import ordering - `pytest -k "eval_lm or viz_lm" -q` *(fails: ModuleNotFoundError: tensorboardX)* ------ https://chatgpt.com/codex/tasks/task_e_6858da279ef88327a9f11a4d206dab07
## Summary - support `initialize_from_hf` and `use_hf_model_config` in evaluation scripts - document using `eval_sliding_lm.py` with HF checkpoints ## Testing - `black src/levanter/main/eval_lm.py src/levanter/main/eval_sliding_lm.py` - `isort src/levanter/main/eval_lm.py src/levanter/main/eval_sliding_lm.py` - `pytest -k "eval_lm or viz_lm" -q` *(fails: ModuleNotFoundError: tensorboardX)* ------ https://chatgpt.com/codex/tasks/task_e_6858da279ef88327a9f11a4d206dab07
## Summary - support `initialize_from_hf` and `use_hf_model_config` in evaluation scripts - document using `eval_sliding_lm.py` with HF checkpoints - update sliding evaluation config to load dataset from GCS ## Testing - `black src/levanter/main/eval_lm.py src/levanter/main/eval_sliding_lm.py` - `isort src/levanter/main/eval_lm.py src/levanter/main/eval_sliding_lm.py` - `pytest -k "eval_lm or viz_lm" -q` *(fails: ModuleNotFoundError: tensorboardX)* ------ https://chatgpt.com/codex/tasks/task_e_6858da279ef88327a9f11a4d206dab07
## Summary - handle scalar and 1-D arrays returned by `compute_log_probs` ## Testing - `ruff check .` *(fails: `F401 levanter.analysis imported but unused` and other errors)* - `pytest -q` *(fails: `ModuleNotFoundError: No module named 'tensorboardX'`)* ------ https://chatgpt.com/codex/tasks/task_e_68599c29706483278cffaffce8f442ef
## Summary - integrate DataLoader into `eval_careless_lm.py` - compute per-window probabilities on device - add `use_dataloader` option documenting the tradeoff ## Testing - `black src/levanter/main/eval_careless_lm.py` - `pytest -q` *(fails: ModuleNotFoundError: No module named 'torch')* ------ https://chatgpt.com/codex/tasks/task_e_68841b9691448327bde32d2d8d99cfb0
## Summary - allow token-based analysis for eval_careless_lm - implement utilities for sliding windows over token ids - update Gatsby example config to show token_mode usage ## Testing - `pre-commit run --files src/levanter/books/util.py src/levanter/main/eval_careless_lm.py config/books/eval_careless_llama3.1_70b_gatsby.yaml` *(fails: command not found)* - `pytest -k histogram -q` *(fails: ImportError: cannot import name 'PositionalSharding')* ------ https://chatgpt.com/codex/tasks/task_e_688bfac097388327a23154a5a6f999b1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
eval_sliding_total.pydriver to run careless suffix likelihood over many books from one configbook_titlewhen not specifiedTesting
pytest tests/test_background_iterable.py -k test_reentrancy -q(fails: ImportError: cannot import name 'PositionalSharding' from 'jax.sharding')pip install --quiet "jax==0.4.26" "jaxlib==0.4.26" -f https://storage.googleapis.com/jax-releases/jax_releases.html(fails: Could not find a version that satisfies the requirement jax==0.4.26)pip install --quiet pre-commit && pre-commit run --files src/levanter/main/eval_careless_lm.py src/levanter/main/eval_sliding_total.py config/books/eval_sliding_total.yaml(fails: Could not find a version that satisfies the requirement pre-commit)https://chatgpt.com/codex/tasks/task_e_68916b50a600832789d90019a3188e1b