Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade liger to 0.4.0 #1973

Merged
merged 13 commits into from
Nov 7, 2024
Merged

upgrade liger to 0.4.0 #1973

merged 13 commits into from
Nov 7, 2024

Conversation

winglian
Copy link
Collaborator

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

src/axolotl/integrations/liger/args.py Outdated Show resolved Hide resolved
tests/integrations/liger.py Show resolved Hide resolved
src/axolotl/integrations/liger/__init__.py Show resolved Hide resolved
README.md Show resolved Hide resolved
requirements.txt Outdated
@@ -34,7 +34,7 @@ tensorboard
python-dotenv==1.0.1
autoawq>=0.2.5
triton>=2.3.0
liger-kernel==0.3.0
liger-kernel==0.3.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for their latest release or point to this commit for GA fix? linkedin/Liger-Kernel#333

@bursteratom
Copy link
Collaborator

bursteratom commented Nov 1, 2024

@NanoCode012 @winglian tried to run this particular branch just now but ran into this error

File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/triton/compiler/code_generator.py", line 1066, in visit_Attribute                                  [114/1839]
             return getattr(lhs, node.attr)                                                                                                                                   [113/1839]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                             ]
[rank0]: AttributeError: 'tensor' object has no attribute 'cast'

which happens during

File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 678, in compute_loss                                                                                     
[rank0]:     return super().compute_loss(model, inputs, return_outputs=return_outputs)                                                                                                  
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm using the default axolotl template on runpod and made sure to install the dependencies associated with this branch

And my yaml is as follows:

base_model: NousResearch/Meta-Llama-3.1-8B

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

strict: false

datasets:
    - path: tatsu-lab/alpaca
      type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_backward_prefetch: BACKWARD_PRE
special_tokens:
  pad_token: <|finetune_right_pad_id|>
  eos_token: <|eot_id|>

@winglian winglian changed the title upgrade liger to 0.3.1 upgrade liger to 0.4.0 Nov 6, 2024
@winglian winglian merged commit 02ce520 into main Nov 7, 2024
14 checks passed
@winglian winglian deleted the upgrade-liger branch November 7, 2024 17:53
@winglian winglian mentioned this pull request Nov 7, 2024
bursteratom pushed a commit that referenced this pull request Nov 18, 2024
* upgrade liger to 0.3.1

* update docs and example

* skip duplicate code check

* Update src/axolotl/integrations/liger/args.py

Co-authored-by: NanoCode012 <[email protected]>

* Update README.md

Co-authored-by: NanoCode012 <[email protected]>

* add logging

* chore: lint

* add test case

* upgrade liger and transformers

* also upgrade accelerate

* use kwargs to support patch release

* make sure prepared path is empty for test

* use transfromers 4.46.1 since 4.46.2 breaks fsdp

---------

Co-authored-by: NanoCode012 <[email protected]>
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* upgrade liger to 0.3.1

* update docs and example

* skip duplicate code check

* Update src/axolotl/integrations/liger/args.py

Co-authored-by: NanoCode012 <[email protected]>

* Update README.md

Co-authored-by: NanoCode012 <[email protected]>

* add logging

* chore: lint

* add test case

* upgrade liger and transformers

* also upgrade accelerate

* use kwargs to support patch release

* make sure prepared path is empty for test

* use transfromers 4.46.1 since 4.46.2 breaks fsdp

---------

Co-authored-by: NanoCode012 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants