@@ -47,7 +47,7 @@ GGUF Parser helps in reviewing and estimating the usage of a GGUF format model w
47
47
## Notes
48
48
49
49
- Since v0.7.2, GGUF Parser supports retrieving the model's metadata via split file,
50
- which suffixes with " -00001-of-00009.gguf" .
50
+ which suffixes with something like ` -00001-of-00009.gguf ` .
51
51
- The table result ` UMA ` indicates the memory usage of Apple MacOS only.
52
52
- Since v0.7.0, GGUF Parser is going to support estimating the usage of multiple GPUs.
53
53
+ The table result ` RAM ` means the system memory usage when
@@ -105,40 +105,41 @@ $ gguf-parser --path="~/.cache/lm-studio/models/NousResearch/Hermes-2-Pro-Mistra
105
105
106
106
$ # Retrieve the model's metadata via split file,
107
107
$ # which needs all split files has been downloaded.
108
+ $ gguf-parser --path=" ~/.cache/lm-studio/models/Qwen/Qwen2-72B-Instruct-GGUF/qwen2-72b-instruct-q6_k-00001-of-00002.gguf"
108
109
109
- +-----------------------------------------------------------------------------------------------------------+
110
- | MODEL |
111
- +------------------------------+-------+--------------+---------------+------------ +------------+- ----------+
112
- | NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
113
- +------------------------------+-------+--------------+---------------+------------ +------------+- ----------+
114
- | Meta Llama 3.1 405B Instruct | llama | BF16 | true | 763.84 GiB | 410.08 B | 16.00 bpw |
115
- +------------------------------+-------+--------------+---------------+------------ +------------+- ----------+
110
+ +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +
111
+ | MODEL |
112
+ +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +-------+--------------+---------------+-----------+------------+----------+
113
+ | NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
114
+ +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +-------+--------------+---------------+-----------+------------+----------+
115
+ | 72b.5000B--cmix31-base100w-cpt32k_mega_v1_reflection_4_identity_2_if_ondare_beta0.09_lr_1e-6_bs128_epoch2-72B.qwen2B-bf16-mp8-pp4-lr-1e-6-minlr-1e-9-bs-128-seqlen-4096-step1350 | qwen2 | IQ1_S/Q6_K | true | 59.92 GiB | 72.71 B | 7.08 bpw |
116
+ +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +-------+--------------+---------------+-----------+------------+----------+
116
117
117
118
+---------------------------------------------------------------------------------------------------------------------------------------------------+
118
119
| ARCHITECTURE |
119
120
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
120
121
| MAX CONTEXT LEN | EMBEDDING LEN | EMBEDDING GQA | ATTENTION CAUSAL | ATTENTION HEAD CNT | LAYERS | FEED FORWARD LEN | EXPERT CNT | VOCABULARY LEN |
121
122
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
122
- | 131072 | 16384 | 8 | true | 128 | 126 | 53248 | 0 | 128256 |
123
+ | 32768 | 8192 | 8 | true | 64 | 80 | 29568 | 0 | 152064 |
123
124
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
124
125
125
126
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
126
127
| TOKENIZER |
127
128
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
128
129
| MODEL | TOKENS SIZE | TOKENS LEN | ADDED TOKENS LEN | BOS TOKEN | EOS TOKEN | EOT TOKEN | EOM TOKEN | UNKNOWN TOKEN | SEPARATOR TOKEN | PADDING TOKEN |
129
130
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
130
- | gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A |
131
+ | gpt2 | 2.47 MiB | 152064 | N/A | 151643 | 151645 | N/A | N/A | N/A | N/A | 151643 |
131
132
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
132
133
133
- +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +
134
- | ESTIMATE |
135
- +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+---------------------- +
136
- | ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
137
- | | | | | | | | +------------+------------+---------+- -----------+
138
- | | | | | | | | | UMA | NONUMA | UMA | NONUMA |
139
- +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+- -----------+
140
- | llama | 131072 | 2048 / 512 | Disabled | Supported | No | 127 (126 + 1) | Yes | 684.53 MiB | 834.53 MiB | 126 GiB | 919.55 GiB |
141
- +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+- -----------+
134
+ +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
135
+ | ESTIMATE |
136
+ +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+--------------------+
137
+ | ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
138
+ | | | | | | | | +------------+------------+--------+ -----------+
139
+ | | | | | | | | | UMA | NONUMA | UMA | NONUMA |
140
+ +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+ -----------+
141
+ | qwen2 | 32768 | 2048 / 512 | Disabled | Supported | No | 81 (80 + 1) | Yes | 307.38 MiB | 457.38 MiB | 10 GiB | 73.47 GiB |
142
+ +-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+ -----------+
142
143
143
144
```
144
145
0 commit comments