Skip to content

Commit 9bca66f

Browse files
committed
More tweaks for 2025-03-06 event
Signed-off-by: Sujee Maniyam <[email protected]>
1 parent 269387f commit 9bca66f

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

events/2025-03-06__gneissweb.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,10 @@
1414

1515
**Introducing GneissWeb (pronounced "niceWeb"), a state-of-the-art LLM pre-training dataset**
1616

17-
This introductory session will not be a Hands-on workshop, but I will take the audience through code examples that they can try at their leisure. Later on, in the meetups, I can make this a hands-on workshop.
18-
Description: At IBM, responsible AI implies transparency in training data:
19-
Introducing GneissWeb (pronounced “niceWeb”), a state-of-the-art LLM pre-training dataset with ~10 Trillion tokens derived from FineWeb, with open recipes, results, and tools for reproduction!
17+
At IBM, responsible AI implies transparency in training data:
18+
Introducing GneissWeb (pronounced "niceWeb"), a state-of-the-art LLM pre-training dataset with ~10 Trillion tokens derived from FineWeb, with open recipes, results, and tools for reproduction!
19+
20+
In this session we will go over how we created GneissWeb and discuss tools and techniques used. We will provide code examples that you can try at your leisure.
2021

2122
👉 > 2% avg improvement in benchmark performance over FineWeb
2223
👉 [Huggingface page](https://huggingface.co/datasets/ibm-granite/GneissWeb){:target="_blank" rel="noopener"}

0 commit comments

Comments
 (0)