You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed in the training output logs that the captions did not always include the autocaption_prefix or autocaption_suffix. This is pretty important as it contains the unique trigger token.
After looking at caption.py I noticed this block:
if autocaption_prefix:
inp += f"\n\nYou must start the caption with '{autocaption_prefix}'. "
if autocaption_suffix:
inp += f"\n\nYou must end the caption with '{autocaption_suffix}'."
Instead of relying on the llm to add these which it is clearly failing to do, I suggest manually adding to the resulting output as follows:
output = self.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[
0
].strip()
if autocaption_prefix:
output = f"{autocaption_prefix} {output}"
if autocaption_suffix:
output = f"{output} {autocaption_suffix}"
print(f"Caption for {image_path}: {output}")
The text was updated successfully, but these errors were encountered:
I am seeing less than 20% of captions having the prefix or suffix. I have worked around it for the time being by allowing autocaptioning to proceed, then cancelling the run as soon as it finishes. Then I grab the logs and manually insert my prefix on each one, and write each line to a matching .txt file, turn off autocaptioning, and start over using a new zipfile with the included .txts.
I observed in the training output logs that the captions did not always include the autocaption_prefix or autocaption_suffix. This is pretty important as it contains the unique trigger token.
After looking at caption.py I noticed this block:
Instead of relying on the llm to add these which it is clearly failing to do, I suggest manually adding to the resulting output as follows:
The text was updated successfully, but these errors were encountered: