Practical Llama 3, 3.1 and 3.2 inference implemented purely in Scala 3.6.4, leveraging the Java Vector API for performance.
This project supports running Llama models in GGUF format.
- Java Development Kit (JDK) 21 or later (required for the Vector API).
- sbt (Scala Build Tool).
To compile the project and create a runnable JAR file, use the sbt assembly
command:
sbt assembly
This will generate a fat JAR file in the target/scala-3.6.4/
directory (e.g., target/scala-3.6.4/llmtest-assembly-0.1.0.jar
).
You can run the Llama model using the assembled JAR file. You must provide the path to the model file (.gguf
format) using the --model
or -m
argument.
Make sure to include the --add-modules=jdk.incubator.vector
JVM option when running.
Example (Interactive Mode):
java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar --model /path/to/your/model.gguf
Example (Single Prompt Mode):
java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar \
--model /path/to/your/model.gguf \
--prompt "Translate the following English text to French: 'Hello world!'"
--model <path>
,-m <path>
: (Required) Path to the model file in GGUF format.--prompt <text>
,-p <text>
: Run in single-prompt mode with the given text. If omitted, runs in interactive mode.--system-prompt <text>
: Set a system prompt for the model.--temperature <float>
: Sampling temperature (default: 0.1).--topp <float>
: Top-P (nucleus) sampling value (default: 0.95).--seed <long>
: Random seed (default: System.nanoTime).--max-tokens <int>
: Maximum number of tokens to generate (default: 16384).--stream <boolean>
: Print tokens as they are generated (default: true).--echo <boolean>
: Print all tokens (including prompt) to stderr (default: false).
Join our Discord server to discuss the project, ask questions, and share your results: https://discord.com/invite/vgEg2ZtxCw
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
This project is inspired by and based on the work of: