You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FastAPI for BitNet inference framework functions
* Benchmark BitNet models
* Calculate BitNet model perplexity
* Run BitNet inference framework
Includes Dockerfile for running the FastAPI in a contained environment.
docker run -d --name ai_container -p 8080:8080 fastapi_bitnet
32
+
```
33
+
34
+
Once it's running navigate to http://127.0.0.1:8080/docs
35
+
36
+
---
37
+
38
+
Note:
39
+
40
+
If seeking to use this in production, make sure to extend the docker image with additional [authentication security](https://github.com/mjhea0/awesome-fastapi?tab=readme-ov-file#auth) steps. In its current state it's intended for use locally.
41
+
42
+
Building the docker file image requires upwards of 40GB RAM for `Llama3-8B-1.58-100B-tokens`, if you have less than 64GB RAM you will probably run into issues.
43
+
44
+
The Dockerfile deletes the larger f32 files, so as to reduce the time to build the docker image file, you'll need to comment out the `find /code/models/....` lines if you want the larger f32 files included.
0 commit comments