UPSTREAM PR #19488: model: add JAIS-2 architecture support#1168
UPSTREAM PR #19488: model: add JAIS-2 architecture support#1168
Conversation
Add support for the JAIS-2 family of Arabic-English bilingual models from Inception AI (https://huggingface.co/inceptionai/Jais-2-8B-Chat). Architecture characteristics: - LayerNorm (not RMSNorm) with biases - ReLU² (ReLU squared) activation function - Separate Q/K/V projections with biases - Simple MLP without gate projection (up -> act -> down) - RoPE positional embeddings - GPT-2 BPE tokenizer Supported model sizes: - Jais-2-8B (32 layers, 26 heads, 3328 hidden) - Jais-2-70B (68 layers, 56 heads, 7168 hidden) Tested with quantizations: BF16, Q8_0, Q6_K, Q5_K_M, Q5_0, Q4_K_M, Q4_0, Q3_K_M, Q2_K Note: JAIS-2 requires F32 precision accumulators for numerical stability and uses standard attention (not flash attention) on CUDA backends.
OverviewAnalysis of 115,033 functions across JAIS-2 architecture integration reveals minimal performance impact. Modified functions: 36 (0.03%), new functions: 31, removed: 0, unchanged: 114,966 (99.94%). Power Consumption Changes:
Function AnalysisAll performance changes occur in C++ STL functions during initialization, not inference hot paths: Regressions (non-critical):
Improvements:
All changes are compiler optimization artifacts in initialization code. No source code modifications justify the performance differences. Cumulative impact: -89μs in model loading (0.001% of total), +112ns per inference batch (0.0002% of total). Additional FindingsNo modifications to performance-critical operations: matrix multiplication (70-90% of inference time), attention mechanisms, quantization kernels, or GPU backends remain unchanged. Flash Attention enabled for JAIS-2 as optimization. GGML libraries show 0.000% power change, confirming no alterations to tensor operations. The 0.143% power increase in libllama.so represents static code addition (748+ tensor definitions) without runtime overhead. 🔎 Full breakdown: Loci Inspector. |
10f8f26 to
a6ecec6
Compare
6495042 to
61b4303
Compare
8c889a6 to
13648e6
Compare
8019888 to
17452e3
Compare
Note
Source pull request: ggml-org/llama.cpp#19488
Add support for the JAIS-2 family of Arabic-English bilingual models from Inception AI (https://huggingface.co/inceptionai/Jais-2-8B-Chat).
Architecture characteristics:
Supported model sizes:
Tested with quantizations: BF16, Q8_0, Q6_K, Q5_K_M, Q5_0, Q4_K_M, Q4_0, Q3_K_M, Q2_K
GGUF weights on the Hub (for tests) : https://huggingface.co/inceptionai/Jais-2-8B-Chat-GGUF