How to use the tokenizer? #16

bhosalems · 2023-11-21T04:19:12Z

Thanks for releasing the model on Huggingface.

I wanted to use the text encoder. For that I need to tokenize the input. But how to use the tokenizer? Can we use it from the CLIPprocessor?

processor = CLIPProcessor.from_pretrained("vinid/plip")
tokenizer = processor.tokenizer

But with this, the max_model_length is insanely high value 1000000000000000019884624838656.
So I was wondering if this is the correct use.

CLIPTokenizerFast(name_or_path='vinid/plip', vocab_size=49408, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True), added_tokens_decoder={
49406: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
49407: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}

huangzhii · 2024-03-26T01:41:02Z

Please provide us with a minimum reproducible code, so we may be able to assist.
Here is a good tutorial on how to prepare a minimal, reproducible code example:
https://stackoverflow.com/help/minimal-reproducible-example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the tokenizer? #16

How to use the tokenizer? #16

bhosalems commented Nov 21, 2023

huangzhii commented Mar 26, 2024

How to use the tokenizer? #16

How to use the tokenizer? #16

Comments

bhosalems commented Nov 21, 2023

huangzhii commented Mar 26, 2024