Llama4S

Practical Llama 3, 3.1 and 3.2 inference implemented purely in Scala 3.6.4, leveraging the Java Vector API for performance.

This project supports running Llama models in GGUF format.

Prerequisites

Java Development Kit (JDK) 21 or later (required for the Vector API).
sbt (Scala Build Tool).

Building

To compile the project and create a runnable JAR file, use the sbt assembly command:

sbt assembly

This will generate a fat JAR file in the target/scala-3.6.4/ directory (e.g., target/scala-3.6.4/llmtest-assembly-0.1.0.jar).

Running

You can run the Llama model using the assembled JAR file. You must provide the path to the model file (.gguf format) using the --model or -m argument.

Make sure to include the --add-modules=jdk.incubator.vector JVM option when running.

Example (Interactive Mode):

java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar --model /path/to/your/model.gguf

Example (Single Prompt Mode):

java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar \
  --model /path/to/your/model.gguf \
  --prompt "Translate the following English text to French: 'Hello world!'"

Command-Line Options

--model <path>, -m <path>: (Required) Path to the model file in GGUF format.
--prompt <text>, -p <text>: Run in single-prompt mode with the given text. If omitted, runs in interactive mode.
--system-prompt <text>: Set a system prompt for the model.
--temperature <float>: Sampling temperature (default: 0.1).
--topp <float>: Top-P (nucleus) sampling value (default: 0.95).
--seed <long>: Random seed (default: System.nanoTime).
--max-tokens <int>: Maximum number of tokens to generate (default: 16384).
--stream <boolean>: Print tokens as they are generated (default: true).
--echo <boolean>: Print all tokens (including prompt) to stderr (default: false).

Community

Join our Discord server to discuss the project, ask questions, and share your results: https://discord.com/invite/vgEg2ZtxCw

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Acknowledgements

This project is inspired by and based on the work of:

llama3.java
llama2.c by Andrej Karpathy and his excellent educational videos.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
project		project
src		src
.gitignore		.gitignore
.jvmopts		.jvmopts
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama4S

Prerequisites

Building

Running

Command-Line Options

Community

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

XpressAI/llama4s

Folders and files

Latest commit

History

Repository files navigation

Llama4S

Prerequisites

Building

Running

Command-Line Options

Community

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages