Skip to content

XpressAI/llama4s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Llama4S

License Scala Version Discord

Practical Llama 3, 3.1 and 3.2 inference implemented purely in Scala 3.6.4, leveraging the Java Vector API for performance.

This project supports running Llama models in GGUF format.

Prerequisites

  • Java Development Kit (JDK) 21 or later (required for the Vector API).
  • sbt (Scala Build Tool).

Building

To compile the project and create a runnable JAR file, use the sbt assembly command:

sbt assembly

This will generate a fat JAR file in the target/scala-3.6.4/ directory (e.g., target/scala-3.6.4/llmtest-assembly-0.1.0.jar).

Running

You can run the Llama model using the assembled JAR file. You must provide the path to the model file (.gguf format) using the --model or -m argument.

Make sure to include the --add-modules=jdk.incubator.vector JVM option when running.

Example (Interactive Mode):

java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar --model /path/to/your/model.gguf

Example (Single Prompt Mode):

java --add-modules=jdk.incubator.vector -jar target/scala-3.6.4/llmtest-assembly-0.1.0.jar \
  --model /path/to/your/model.gguf \
  --prompt "Translate the following English text to French: 'Hello world!'"

Command-Line Options

  • --model <path>, -m <path>: (Required) Path to the model file in GGUF format.
  • --prompt <text>, -p <text>: Run in single-prompt mode with the given text. If omitted, runs in interactive mode.
  • --system-prompt <text>: Set a system prompt for the model.
  • --temperature <float>: Sampling temperature (default: 0.1).
  • --topp <float>: Top-P (nucleus) sampling value (default: 0.95).
  • --seed <long>: Random seed (default: System.nanoTime).
  • --max-tokens <int>: Maximum number of tokens to generate (default: 16384).
  • --stream <boolean>: Print tokens as they are generated (default: true).
  • --echo <boolean>: Print all tokens (including prompt) to stderr (default: false).

Community

Join our Discord server to discuss the project, ask questions, and share your results: https://discord.com/invite/vgEg2ZtxCw

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Acknowledgements

This project is inspired by and based on the work of:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published