Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit f4ac173

Browse files
authored
Merge pull request #356 from janhq/chore/documentation-0.2.10
Documentation 0.2.10
2 parents 33c9540 + 0318d77 commit f4ac173

File tree

7 files changed

+75
-5
lines changed

7 files changed

+75
-5
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,14 +125,17 @@ Table of parameters
125125
| `n_batch` | Integer | The batch size for prompt eval step |
126126
| `caching_enabled` | Boolean | To enable prompt caching or not |
127127
| `clean_cache_threshold` | Integer | Number of chats that will trigger clean cache action|
128+
|`grp_attn_n`|Integer|Group attention factor in self-extend|
129+
|`grp_attn_w`|Integer|Group attention width in self-extend|
128130

129131
***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
130132
```zsh
131-
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port])
133+
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port] [uploads_folder_path])
132134
```
133135
- thread_num : the number of thread that nitro webserver needs to have
134136
- host : host value normally 127.0.0.1 or 0.0.0.0
135137
- port : the port that nitro got deployed onto
138+
- uploads_folder_path: custom path for file uploads in Drogon.
136139

137140
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
138141

docs/docs/examples/chatboxgpt.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Nitro on browser
2+
title: Nitro with ChatGPTBox
33
description: Nitro intergration guide for using on Web browser.
44
keywords: [Nitro, Google Chrome, browser, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama]
55
---

docs/docs/features/grammar.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: GBNF Grammar
3+
description: What Nitro supports
4+
keywords: [Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama]
5+
---
6+
7+
## GBNF Grammar
8+
9+
GBNF (GGML BNF) makes it easy to set rules for how a model talks or writes. Think of it like teaching the model to always speak correctly, whether it's in emoji or proper JSON format.
10+
11+
Bakus-Naur Form (BNF) is a way to describe the rules of computer languages, files, and how they talk to each other. GBNF builds on BNF, adding modern features similar to those found in regular expressions.
12+
13+
In GBNF, we create rules (production rules) to guide how a model forms its responses. These rules use a mix of fixed characters (like letters or emojis) and flexible parts that can change. Each rule follows a format: `nonterminal ::= sequence...`.
14+
15+
To get a clearer picture, check out [this guide](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md).
16+
17+
## Use GBNF Grammar in Nitro
18+
19+
To make your Nitro model follow specific speaking or writing rules, use this command:
20+
21+
```bash title="Nitro Inference With Grammar" {10}
22+
curl http://localhost:3928/v1/chat/completions \
23+
-H "Content-Type: application/json" \
24+
-d '{
25+
"messages": [
26+
{
27+
"role": "user",
28+
"content": "Who won the world series in 2020?"
29+
},
30+
],
31+
"grammar_file": "/path/to/grammarfile"
32+
}'
33+
```

docs/docs/features/load-unload.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,4 +77,6 @@ In case you got error while loading models. Please check for the correct model p
7777
| `ai_prompt` | String | The prompt to use for the AI assistant. |
7878
| `system_prompt` | String | The prompt for system rules. |
7979
| `pre_prompt` | String | The prompt to use for internal configuration. |
80-
|`clean_cache_threshold`| Integer| Number of chats that will trigger clean cache action.|
80+
|`clean_cache_threshold`| Integer| Number of chats that will trigger clean cache action.|
81+
|`grp_attn_n`|Integer|Group attention factor in self-extend|
82+
|`grp_attn_w`|Integer|Group attention width in self-extend|

docs/docs/features/multi-thread.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,13 @@ For more information on threading, visit [Drogon's Documentation](https://github
2222
To increase the number of threads used by Nitro, use the following command syntax:
2323

2424
```bash title="Nitro deploy server format"
25-
nitro [thread_num] [host] [port]
25+
nitro [thread_num] [host] [port] [uploads_folder_path]
2626
```
2727

2828
- **thread_num:** Specifies the number of threads for the Nitro server.
2929
- **host:** The host address normally `127.0.0.1` (localhost) or `0.0.0.0` (all interfaces).
3030
- **port:** The port number where Nitro is to be deployed.
31+
- **uploads_folder_path:** To set a custom path for file uploads in Drogon. Otherwise, it uses the current folder as the default location.
3132

3233
To launch Nitro with 4 threads, enter this command in the terminal:
3334
```bash title="Example"

docs/docs/features/self-extend.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Self extend
3+
description: Self-Extend LLM Context Window Without Tuning
4+
keywords: [long context, longlm, Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama]
5+
---
6+
7+
## Enhancing LLMs with Self-Extend
8+
Self-Extend offers an innovative approach to increase the context window of Large Language Models (LLMs) without the usual need for re-tuning. This method adapts the attention mechanism during the inference phase and eliminates the necessity for additional training or fine-tuning.
9+
10+
For in-depth technical insights, refer to their research [paper](https://arxiv.org/pdf/2401.01325.pdf).
11+
12+
## Activating Self-Extend for LLMs
13+
14+
To activate the Self-Extend feature while loading your model, use the following command:
15+
16+
```bash title="Enable Self-Extend" {6,7}
17+
curl http://localhost:3928/inferences/llamacpp/loadmodel \
18+
-H 'Content-Type: application/json' \
19+
-d '{
20+
"llama_model_path": "/path/to/your_model.gguf",
21+
"ctx_len": 8192,
22+
"grp_attn_n": 4,
23+
"grp_attn_w": 2048,
24+
}'
25+
```
26+
27+
**Note:**
28+
- For optimal performance, `grp_attn_w` should be as large as possible, but smaller than the training context length.
29+
- Setting `grp_attn_n` between 2 to 4 is recommended for peak efficiency. Higher values may result in increased incoherence in output.

docs/sidebars.js

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,9 @@ const sidebars = {
5252
"features/load-unload",
5353
"features/warmup",
5454
"features/prompt",
55-
"features/log"
55+
"features/log",
56+
"features/self-extend",
57+
"features/grammar",
5658
],
5759
},
5860
{

0 commit comments

Comments
 (0)