Skip to content

Commit 3bf700f

Browse files
committed
docs: readme
Signed-off-by: thxCode <[email protected]>
1 parent 65a9ff8 commit 3bf700f

File tree

1 file changed

+20
-19
lines changed

1 file changed

+20
-19
lines changed

README.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ GGUF Parser helps in reviewing and estimating the usage of a GGUF format model w
4747
## Notes
4848

4949
- Since v0.7.2, GGUF Parser supports retrieving the model's metadata via split file,
50-
which suffixes with "-00001-of-00009.gguf".
50+
which suffixes with something like `-00001-of-00009.gguf`.
5151
- The table result `UMA` indicates the memory usage of Apple MacOS only.
5252
- Since v0.7.0, GGUF Parser is going to support estimating the usage of multiple GPUs.
5353
+ The table result `RAM` means the system memory usage when
@@ -105,40 +105,41 @@ $ gguf-parser --path="~/.cache/lm-studio/models/NousResearch/Hermes-2-Pro-Mistra
105105

106106
$ # Retrieve the model's metadata via split file,
107107
$ # which needs all split files has been downloaded.
108+
$ gguf-parser --path="~/.cache/lm-studio/models/Qwen/Qwen2-72B-Instruct-GGUF/qwen2-72b-instruct-q6_k-00001-of-00002.gguf"
108109

109-
+-----------------------------------------------------------------------------------------------------------+
110-
| MODEL |
111-
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
112-
| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
113-
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
114-
| Meta Llama 3.1 405B Instruct | llama | BF16 | true | 763.84 GiB | 410.08 B | 16.00 bpw |
115-
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
110+
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
111+
| MODEL |
112+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+
113+
| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
114+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+
115+
| 72b.5000B--cmix31-base100w-cpt32k_mega_v1_reflection_4_identity_2_if_ondare_beta0.09_lr_1e-6_bs128_epoch2-72B.qwen2B-bf16-mp8-pp4-lr-1e-6-minlr-1e-9-bs-128-seqlen-4096-step1350 | qwen2 | IQ1_S/Q6_K | true | 59.92 GiB | 72.71 B | 7.08 bpw |
116+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+
116117

117118
+---------------------------------------------------------------------------------------------------------------------------------------------------+
118119
| ARCHITECTURE |
119120
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
120121
| MAX CONTEXT LEN | EMBEDDING LEN | EMBEDDING GQA | ATTENTION CAUSAL | ATTENTION HEAD CNT | LAYERS | FEED FORWARD LEN | EXPERT CNT | VOCABULARY LEN |
121122
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
122-
| 131072 | 16384 | 8 | true | 128 | 126 | 53248 | 0 | 128256 |
123+
| 32768 | 8192 | 8 | true | 64 | 80 | 29568 | 0 | 152064 |
123124
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
124125

125126
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
126127
| TOKENIZER |
127128
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
128129
| MODEL | TOKENS SIZE | TOKENS LEN | ADDED TOKENS LEN | BOS TOKEN | EOS TOKEN | EOT TOKEN | EOM TOKEN | UNKNOWN TOKEN | SEPARATOR TOKEN | PADDING TOKEN |
129130
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
130-
| gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A |
131+
| gpt2 | 2.47 MiB | 152064 | N/A | 151643 | 151645 | N/A | N/A | N/A | N/A | 151643 |
131132
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
132133

133-
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
134-
| ESTIMATE |
135-
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+----------------------+
136-
| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
137-
| | | | | | | | +------------+------------+---------+------------+
138-
| | | | | | | | | UMA | NONUMA | UMA | NONUMA |
139-
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+
140-
| llama | 131072 | 2048 / 512 | Disabled | Supported | No | 127 (126 + 1) | Yes | 684.53 MiB | 834.53 MiB | 126 GiB | 919.55 GiB |
141-
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+
134+
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
135+
| ESTIMATE |
136+
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+--------------------+
137+
| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
138+
| | | | | | | | +------------+------------+--------+-----------+
139+
| | | | | | | | | UMA | NONUMA | UMA | NONUMA |
140+
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+
141+
| qwen2 | 32768 | 2048 / 512 | Disabled | Supported | No | 81 (80 + 1) | Yes | 307.38 MiB | 457.38 MiB | 10 GiB | 73.47 GiB |
142+
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+
142143

143144
```
144145

0 commit comments

Comments
 (0)