-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Draft: Adding GPT2 support to Adaption prompts #2440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…on_prompt_edits
Thanks a lot for the PR. I haven't checked the details yet, but regarding your testing question, the missing piece was that you need to set the seed for each generate call. Here is code that passes: import torch
from transformers import GPT2Config, GPT2Model
device = 0
config=GPT2Config(
vocab_size=16,
hidden_size=8, #mapped to n_embd
n_layers=8, #mapped to n_layers
num_attention_heads=4, #mapped to n_head
use_cache=True,
attn_implementation="eager"
)
input_ids = torch.LongTensor([[1, 1, 1], [2, 1, 2]]).to(device)
target_ids = torch.LongTensor([[0, 0, 0], [0, 0, 0]]).to(device)
attention_mask = torch.LongTensor([[1, 1, 1], [1, 0, 1]]).to(device)
# Create and compare gpt2 model outputs
model_gpt2 = GPT2Model(config)
model_gpt2 = model_gpt2.to(device)
torch.manual_seed(42)
a = model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
torch.manual_seed(42) # <================= important
b = model_gpt2(input_ids=input_ids, attention_mask=attention_mask)
torch.testing.assert_close(a.last_hidden_state, b.last_hidden_state, rtol=1e-6, atol=1e-6) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@efraimdahl are you still working on this? |
|
@efraimdahl No hurry, I was just checking, as sometimes people just forget about their PRs. No need to close this one. As to separating PRs, yes, it's always a good idea to keep them small. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Llama-Adapters unlike the name suggests are model agnostic. This contribution seeks to add an adjustment to the llama-adapter implementation to support GPT2 Models. Currently this is achieved through an additional class
AdaptedAttentionGPT
that wraps the attention layer of GPT2 - type models that handles the difference in attention calculation and the different input formats of the forward function between LLama and GPT transformers.Currently I am testing that the learning behavior of this implementation is as expected, comparing similar LLama and GPT configurations on the same datasets. It passes initial tests for saving/loading/passing data.
Llama adapter require that the the initialized adapter should not change the generation of the base model.
I am having trouble testing for the non-evasiveness of the model as mentioned here, the following test would fail, with or without the adapter. I am looking for alternative ways to test this.