A plain script to compare and count tokens for OpenAI
OpenAIToken Counter is a simple Python script designed to provide insights into the tokenization process of various languages compared to English.
to run the script you need the following Python libraries to be properly installed:
tiktoken
rich
tabulate
To install these prerequisites, execute:
pip install tiktoken rich tabulate
-
Clone the Repository
-
Execute the Script:
python GPTTokencounter.py
-
Language Selection:
-
The program kicks off with a brief overview of its purpose.
-
You'll then be prompted to either:
[a] Default (English and Bangla)
[b] Custom
Choose the default for a comparison between English and Bangla, or opt for a custom language pair.
-
-
Provide Required Inputs:
- For the custom option, you'll need to specify the ISO language code, an English word or sentence, and its counterpart in the chosen language.
-
Examine the Token Comparison:
- The next step showcases a table contrasting the tokenization of the English phrase with the selected language, leveraging the
gpt-3.5-turbo
model.
- The next step showcases a table contrasting the tokenization of the English phrase with the selected language, leveraging the
Language | English | Bangla |
---|---|---|
Sentence | I speak Bengali | আমি বাংলায় কথা কই |
-
The
tiktoken
library is the backbone, facilitating token count based on the model. -
The default model in use is
gpt-3.5-turbo
. Should you wish to experiment with others, adjust themodel_name
variable within themain()
function. -
The
display_word_parameters()
function integrates word wrapping, ensuring legibility for lengthier inputs.
We welcome feedback! If you come across any hiccups do reach out through GitHub.