Skip to content

Conversation

taylorn-ai
Copy link

This PR adds support for newly released OpenAI model variants:

  • gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
  • gpt-4.5, gpt-4.5-mini, gpt-4.5-nano

These models are not currently recognized by tiktoken.encoding_for_model() and raise an exception. All use the o200k_base tokenizer, so this patch ensures they are mapped appropriately.

Changes:

  • Updated MODEL_TO_ENCODING for bare-name support (gpt-4.1, gpt-4.5)
  • Updated MODEL_PREFIX_TO_ENCODING for all suffix variants (gpt-4.1-*, gpt-4.5-*)
  • Extended test_encoding_for_model in test_misc.py to validate correct encoding resolution

Notes

  • These changes follow the existing mapping pattern for gpt-4o and similar models.
  • One unrelated test (test_hyp_roundtrip[cl100k_base]) fails due to tokenizer restrictions on special tokens (e.g., <|endofprompt|>), and is not impacted by this PR.

Taylor added 2 commits June 5, 2025 05:46
- Add support for "gpt-4.1" and "gpt-4.5" with "o200k_base" encoding
- Refactor test to include multiple models for encoding validation
@taylorn-ai
Copy link
Author

I didn't notice #396, but this PR handles 4.1 and 4.5 and related tests.

@taylorn-ai
Copy link
Author

@hauntsaninja FYI

@NazimHAli
Copy link

+1 bump

@cedricvidal
Copy link

+1

@taylorn-ai taylorn-ai requested a review from jbaremoney July 21, 2025 18:18
@jonmclean jonmclean mentioned this pull request Jul 23, 2025
@jonmclean
Copy link

@hauntsaninja ping on this PR. I was about to make the same change when I found this PR. Right now I have to work around it by hard-coding logic for GPT-4.1 in my client code.

@taylorn-ai
Copy link
Author

@hauntsaninja ping on this PR. I was about to make the same change when I found this PR. Right now I have to work around it by hard-coding logic for GPT-4.1 in my client code.

I have merged my changes into the main branch of my fork. You could install from there if you'd like?

pip install https://github.com/TaylorN15/tiktoken/archive/main.zip

@EricAveritt
Copy link

+1

1 similar comment
@javvarcar
Copy link

+1

@hauntsaninja
Copy link
Collaborator

This is fixed in recent versions of tiktoken

@taylorn-ai
Copy link
Author

So instead of merging the PR you just rewrote it yourself? Seems like a waste of time but ok.

@taylorn-ai taylorn-ai deleted the feature/gpt-4.x-encoding branch August 9, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants