Hi,
In your code, you use the following code to set whether using null text vector:
is_empty_text = torch.logical_not(input['condition_mask'][:, 2]).unsqueeze(1).unsqueeze(2).repeat(1, 77, 512)
But, I found that if all of the caption length larger than 2, is_empty_text will always be False. So, I want to ask how to control the classifier free guidance? Whether we add some <image, null text> pair to the training dataset?
Hi,
In your code, you use the following code to set whether using null text vector:
is_empty_text = torch.logical_not(input['condition_mask'][:, 2]).unsqueeze(1).unsqueeze(2).repeat(1, 77, 512)
But, I found that if all of the caption length larger than 2, is_empty_text will always be False. So, I want to ask how to control the classifier free guidance? Whether we add some <image, null text> pair to the training dataset?