Skip to content

Commit 20bea94

Browse files
committed
feat: support split
Signed-off-by: thxCode <[email protected]>
1 parent 71623de commit 20bea94

File tree

8 files changed

+688
-260
lines changed

8 files changed

+688
-260
lines changed

README.md

Lines changed: 156 additions & 42 deletions
Large diffs are not rendered by default.

cmd/gguf-parser/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,15 @@ GLOBAL OPTIONS:
3636
--no-kv-offload, --nkvo Specify disabling Key-Value offloading, which is used to estimate the usage. Disable Key-Value offloading can reduce the usage of VRAM. (default: false)
3737
--no-mmap Specify disabling Memory-Mapped using, which is used to estimate the usage. Memory-Mapped can avoid loading the entire model weights into RAM. (default: false)
3838
--parallel-size value, --parallel value, --np value Specify the number of parallel sequences to decode, which is used to estimate the usage. (default: 1)
39-
--platform-footprint value Specify the platform footprint(RAM,VRAM) in MiB, which is used to estimate the NonUMA usage, default is 150,250. Different platform always gets different RAM and VRAM footprints, for example, within CUDA, 'cudaMemGetInfo' would occupy some RAM and VRAM, see https://stackoverflow.com/questions/64854862/free-memory-occupied-by-cudamemgetinfo. (default: "150,250")
39+
--platform-footprint value Specify the platform footprint(RAM,VRAM) of running host in MiB, which is used to estimate the NonUMA usage, default is 150,250. Different platform always gets different RAM and VRAM footprints, for example, within CUDA, 'cudaMemGetInfo' would occupy some RAM and VRAM, see https://stackoverflow.com/questions/64854862/free-memory-occupied-by-cudamemgetinfo. (default: "150,250")
4040
--split-mode value, --sm value Specify how to split the model across multiple devices, which is used to estimate the usage, select from [layer, row, none]. Since gguf-parser always estimates the usage of VRAM, "none" is meaningless here, keep for compatibility. (default: "layer")
4141
--tensor-split value, --ts value Specify the fraction of the model to offload to each device, which is used to estimate the usage, it is a comma-separated list of integer. Since gguf-parser cannot recognize the host GPU devices or RPC servers, must explicitly set --tensor-split to indicate how many devices are used.
4242
--ubatch-size value, --ub value Specify the physical maximum batch size, which is used to estimate the usage. (default: 512)
4343
4444
Load
4545
46+
--cache-expiration value Specify the expiration of cache, works with --url/--hf-*/--ms-*/--ol-*. (default: 24h0m0s)
47+
--cache-path value Cache the read result to the path, works with --url/--hf-*/--ms-*/--ol-*. (default: "/Users/thxcode/.cache/gguf-parser")
4648
--skip-cache Skip cache, works with --url/--hf-*/--ms-*/--ol-*, default is caching the read result. (default: false)
4749
--skip-dns-cache Skip DNS cache, works with --url/--hf-*/--ms-*/--ol-*, default is caching the DNS lookup result. (default: false)
4850
--skip-proxy Skip proxy settings, works with --url/--hf-*/--ms-*/--ol-*, default is respecting the environment variables HTTP_PROXY/HTTPS_PROXY/NO_PROXY. (default: false)

cmd/gguf-parser/main.go

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import (
99
"strconv"
1010
"strings"
1111
"sync"
12+
"time"
1213

1314
"github.com/gpustack/gguf-parser-go/util/anyx"
1415
"github.com/gpustack/gguf-parser-go/util/json"
@@ -274,6 +275,22 @@ func main() {
274275
"works with --url/--hf-*/--ms-*/--ol-*, " +
275276
"default is detecting the range download support.",
276277
},
278+
&cli.DurationFlag{
279+
Destination: &cacheExpiration,
280+
Value: cacheExpiration,
281+
Category: "Load",
282+
Name: "cache-expiration",
283+
Usage: "Specify the expiration of cache, " +
284+
"works with --url/--hf-*/--ms-*/--ol-*.",
285+
},
286+
&cli.StringFlag{
287+
Destination: &cachePath,
288+
Value: cachePath,
289+
Category: "Load",
290+
Name: "cache-path",
291+
Usage: "Cache the read result to the path, " +
292+
"works with --url/--hf-*/--ms-*/--ol-*.",
293+
},
277294
&cli.BoolFlag{
278295
Destination: &skipCache,
279296
Value: skipCache,
@@ -552,6 +569,8 @@ var (
552569
skipTLSVerify bool
553570
skipDNSCache bool
554571
skipRangDownloadDetect bool
572+
cacheExpiration = 24 * time.Hour
573+
cachePath = DefaultCachePath()
555574
skipCache bool
556575
// estimate options
557576
ctxSize = -1
@@ -608,6 +627,12 @@ func mainAction(c *cli.Context) error {
608627
if skipRangDownloadDetect {
609628
ropts = append(ropts, SkipRangeDownloadDetection())
610629
}
630+
if cacheExpiration > 0 {
631+
ropts = append(ropts, UseCacheExpiration(cacheExpiration))
632+
}
633+
if cachePath != "" {
634+
ropts = append(ropts, UseCachePath(cachePath))
635+
}
611636
if skipCache {
612637
ropts = append(ropts, SkipCache())
613638
}

0 commit comments

Comments
 (0)