Skip to content

Commit a9bcdd7

Browse files
authored
Update smollm3.md -- missing citation for intra-document masking (#3125)
* Update smollm3.md The citation for intra-document masking is missing, fixed it * Update smollm3.md using https://huggingface.co/papers/2402.13991 instead of arxiv
1 parent cb18400 commit a9bcdd7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

smollm3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ SmolLM3 follows a transformer decoder architecture with tied embedding similar t
7373

7474
**NoPE:** We implemented NoPE from "[RoPE to NoRoPE and Back Again: A New Hybrid Attention Strategy](https://huggingface.co/papers/2501.18795)" (Yang et al., 2025), selectively removing rotary position embeddings from every 4th layer. This approach improves long context performance without affecting short context capabilities, as confirmed by our ablations.
7575

76-
**Intra-Document Masking:** During training, we use attention masking to ensure tokens from different documents in the same training sequence don't attend to each other. Similar to Llama 3, this helps with faster and more stable long context training while maintaining short context performance.
76+
**Intra-Document Masking:** Following "[Analysing The Impact of Sequence Composition on Language Model Pre-Training](https://huggingface.co/papers/2402.13991)", during training, we use attention masking to ensure tokens from different documents in the same training sequence don't attend to each other. Similar to Llama 3, this helps with faster and more stable long context training while maintaining short context performance.
7777

7878
**Training Stability:** Following OLMo 2, we remove weight decay from embedding layers to improve training stability. This modification contributed to more stable training dynamics, with embedding norms naturally stabilizing at healthier values during training without impacting overall performance in our ablations.
7979

0 commit comments

Comments
 (0)