Skip to content

Commit be1f118

Browse files
added sharptoken as exampel
1 parent ebfdfe3 commit be1f118

File tree

1 file changed

+4
-19
lines changed

1 file changed

+4
-19
lines changed

examples/How_to_count_tokens_with_tiktoken.ipynb

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"cells": [
33
{
4-
"attachments": {},
54
"cell_type": "markdown",
65
"metadata": {},
76
"source": [
@@ -35,8 +34,9 @@
3534
"\n",
3635
"## Tokenizer libraries by language\n",
3736
"\n",
38-
"For `cl100k_base` and `p50k_base` encodings, `tiktoken` is the only tokenizer available as of March 2023.\n",
37+
"For `cl100k_base` and `p50k_base` encodings:\n",
3938
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
39+
"- .NET / C#: [SharpToken](https://github.com/dmitry-brazhenko/SharpToken)\n",
4040
"\n",
4141
"For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.\n",
4242
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
@@ -54,7 +54,6 @@
5454
]
5555
},
5656
{
57-
"attachments": {},
5857
"cell_type": "markdown",
5958
"metadata": {},
6059
"source": [
@@ -88,7 +87,6 @@
8887
]
8988
},
9089
{
91-
"attachments": {},
9290
"cell_type": "markdown",
9391
"metadata": {},
9492
"source": [
@@ -105,7 +103,6 @@
105103
]
106104
},
107105
{
108-
"attachments": {},
109106
"cell_type": "markdown",
110107
"metadata": {},
111108
"source": [
@@ -126,7 +123,6 @@
126123
]
127124
},
128125
{
129-
"attachments": {},
130126
"cell_type": "markdown",
131127
"metadata": {},
132128
"source": [
@@ -143,7 +139,6 @@
143139
]
144140
},
145141
{
146-
"attachments": {},
147142
"cell_type": "markdown",
148143
"metadata": {},
149144
"source": [
@@ -152,7 +147,6 @@
152147
]
153148
},
154149
{
155-
"attachments": {},
156150
"cell_type": "markdown",
157151
"metadata": {},
158152
"source": [
@@ -180,7 +174,6 @@
180174
]
181175
},
182176
{
183-
"attachments": {},
184177
"cell_type": "markdown",
185178
"metadata": {},
186179
"source": [
@@ -221,15 +214,13 @@
221214
]
222215
},
223216
{
224-
"attachments": {},
225217
"cell_type": "markdown",
226218
"metadata": {},
227219
"source": [
228220
"## 4. Turn tokens into text with `encoding.decode()`"
229221
]
230222
},
231223
{
232-
"attachments": {},
233224
"cell_type": "markdown",
234225
"metadata": {},
235226
"source": [
@@ -257,15 +248,13 @@
257248
]
258249
},
259250
{
260-
"attachments": {},
261251
"cell_type": "markdown",
262252
"metadata": {},
263253
"source": [
264254
"Warning: although `.decode()` can be applied to single tokens, beware that it can be lossy for tokens that aren't on utf-8 boundaries."
265255
]
266256
},
267257
{
268-
"attachments": {},
269258
"cell_type": "markdown",
270259
"metadata": {},
271260
"source": [
@@ -293,15 +282,13 @@
293282
]
294283
},
295284
{
296-
"attachments": {},
297285
"cell_type": "markdown",
298286
"metadata": {},
299287
"source": [
300288
"(The `b` in front of the strings indicates that the strings are byte strings.)"
301289
]
302290
},
303291
{
304-
"attachments": {},
305292
"cell_type": "markdown",
306293
"metadata": {},
307294
"source": [
@@ -424,7 +411,6 @@
424411
]
425412
},
426413
{
427-
"attachments": {},
428414
"cell_type": "markdown",
429415
"metadata": {},
430416
"source": [
@@ -549,7 +535,7 @@
549535
],
550536
"metadata": {
551537
"kernelspec": {
552-
"display_name": "openai",
538+
"display_name": "Python 3",
553539
"language": "python",
554540
"name": "python3"
555541
},
@@ -563,9 +549,8 @@
563549
"name": "python",
564550
"nbconvert_exporter": "python",
565551
"pygments_lexer": "ipython3",
566-
"version": "3.9.9"
552+
"version": "3.7.3"
567553
},
568-
"orig_nbformat": 4,
569554
"vscode": {
570555
"interpreter": {
571556
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"

0 commit comments

Comments
 (0)