Draft: Adding GPT2 support to Adaption prompts #2440

efraimdahl · 2025-03-19T15:20:12Z

Llama-Adapters unlike the name suggests are model agnostic. This contribution seeks to add an adjustment to the llama-adapter implementation to support GPT2 Models. Currently this is achieved through an additional class AdaptedAttentionGPT that wraps the attention layer of GPT2 - type models that handles the difference in attention calculation and the different input formats of the forward function between LLama and GPT transformers.

Currently I am testing that the learning behavior of this implementation is as expected, comparing similar LLama and GPT configurations on the same datasets. It passes initial tests for saving/loading/passing data.

Llama adapter require that the the initialized adapter should not change the generation of the base model.
I am having trouble testing for the non-evasiveness of the model as mentioned here, the following test would fail, with or without the adapter. I am looking for alternative ways to test this.

config=GPT2Config(
            vocab_size=16,
            hidden_size=8, #mapped to n_embd
            n_layers=8, #mapped to n_layers
            num_attention_heads=4, #mapped to n_head
            use_cache=True,
            attn_implementation="eager"
        )
input_ids = torch.LongTensor([[1, 1, 1], [2, 1, 2]]).to(device)
target_ids = torch.LongTensor([[0, 0, 0], [0, 0, 0]]).to(device)
attention_mask = torch.LongTensor([[1, 1, 1], [1, 0, 1]]).to(device)
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# Create and compare gpt2 model outputs .
model_gpt2 = GPT2Model(create_test_gpt2_config())
model_gpt2 = model_gpt2.to(device)
a= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
b= model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
assert_close(a.last_hidden_state, b.last_hidden_state, rtol=0, atol=0)

…on_prompt_edits

BenjaminBossan · 2025-03-20T11:21:02Z

Thanks a lot for the PR. I haven't checked the details yet, but regarding your testing question, the missing piece was that you need to set the seed for each generate call. Here is code that passes:

import torch
from transformers import GPT2Config, GPT2Model

device = 0
config=GPT2Config(
    vocab_size=16,
    hidden_size=8, #mapped to n_embd
    n_layers=8, #mapped to n_layers
    num_attention_heads=4, #mapped to n_head
    use_cache=True,
    attn_implementation="eager"
)
input_ids = torch.LongTensor([[1, 1, 1], [2, 1, 2]]).to(device)
target_ids = torch.LongTensor([[0, 0, 0], [0, 0, 0]]).to(device)
attention_mask = torch.LongTensor([[1, 1, 1], [1, 0, 1]]).to(device)

# Create and compare gpt2 model outputs
model_gpt2 = GPT2Model(config)
model_gpt2 = model_gpt2.to(device)
torch.manual_seed(42)
a = model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
torch.manual_seed(42)  # <================= important
b = model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
torch.testing.assert_close(a.last_hidden_state, b.last_hidden_state, rtol=1e-6, atol=1e-6)

github-actions · 2025-04-19T15:03:37Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2025-04-22T09:32:45Z

@efraimdahl are you still working on this?

efraimdahl · 2025-04-23T09:05:33Z

@efraimdahl are you still working on this?
Cheers @BenjaminBossan. Thank you for checking in. I am still planning on completing this. I am currently experimenting with projecting outside conditioning (to finetune unimodal LLM's for multi-modal reasoning) and trying to verify that its behaving similar to the paper description.
It will take a bit until ill finish this. Let me know if its easier to close this PR for now and reopen at a later point in time, and whether i should separate the GPT2 implementation from the conditioning projection?

BenjaminBossan · 2025-04-23T10:18:52Z

@efraimdahl No hurry, I was just checking, as sometimes people just forget about their PRs. No need to close this one. As to separating PRs, yes, it's always a good idea to keep them small.

github-actions · 2025-05-17T15:03:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

efraimdahl added 8 commits March 10, 2025 17:17

fixed unformatted error

cc87aee

adding GPT2 support preliminaries

1590e02

fixing missing imports

8f63ecf

added GPTAttention Wrapper

46bac05

added test cases for gpt2

f6412ce

Merge branch 'main' of https://github.com/efraimdahl/peft into adapti…

70c0faa

…on_prompt_edits

skipping tests relying on gpt2 determinism

0a84e1a

Merge branch 'huggingface:main' into adaption_prompt_edits

8e83d81

github-actions bot closed this May 25, 2025

githubnemo reopened this May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Adding GPT2 support to Adaption prompts #2440

Draft: Adding GPT2 support to Adaption prompts #2440

Uh oh!

efraimdahl commented Mar 19, 2025

Uh oh!

BenjaminBossan commented Mar 20, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

BenjaminBossan commented Apr 22, 2025

Uh oh!

efraimdahl commented Apr 23, 2025

Uh oh!

BenjaminBossan commented Apr 23, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

Uh oh!

Draft: Adding GPT2 support to Adaption prompts #2440

Are you sure you want to change the base?

Draft: Adding GPT2 support to Adaption prompts #2440

Uh oh!

Conversation

efraimdahl commented Mar 19, 2025

Uh oh!

BenjaminBossan commented Mar 20, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

BenjaminBossan commented Apr 22, 2025

Uh oh!

efraimdahl commented Apr 23, 2025

Uh oh!

BenjaminBossan commented Apr 23, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

Uh oh!