Why do we make the loss computable on the first part of the input sequence (the question)?


can anyone tell me why, in training a GPT-2 model on solving math problem in the GSM8k dataset, did this code make the loss calculable on the first part of the input sequence (which contains the question)? Why should we compute the GPT-2 loss on the question? Don't we need to compute the loss on only the generated answer? Thanks!

class GSMDataset(th.utils.data.Dataset):
    def __init__(self, tokenizer, examples, loss_on_prefix=True):
        self.examples = examples
        self.qns = [ex["question"] for ex in self.examples]
        self.ans = [ex["answer"] for ex in self.examples]
        self.qns = tokenizer(self.qns, padding=False)
        self.ans = tokenizer(self.ans, padding=False)
        self.loss_on_prefix = loss_on_prefix
        self.max_len = max(
            [
                len(self.qns["input_ids"][i]) + len(self.ans["input_ids"][i])
                for i in range(len(self.examples))
            ]
        )
        print(f"Max tokens: {self.max_len}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we make the loss computable on the first part of the input sequence (the question)? #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why do we make the loss computable on the first part of the input sequence (the question)? #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions