dlx

An open source project to customize your own Large Language Model, designed for modifying and learning easily to against current terrible ambiance which the code is complex and opaque. The most teams always only release the inference code, without pretraining code. Such as, Openai and Meta-Llama.

If possible, the models of Computer Vision and Multi-modal will be added, too.

highlights

powerful and diverse dataloader implementation with multi-processes and multi-threads, such as FileSegmentsDataloader to process multi-files under a certain folder.

test projects

The projects under the module dlx.test.

You can also have a quick experience by run torchrun --nproc_per_node 1 experiments/llama3/pretrain.py

todo

add another method for file_segments_dataloader to generate sample list.
add helpful function for tensorboard-style summary writer
analyze the reason of low GPU utilizing
analyze the situation and the modifying necessity of err list index out of range
study the argument n_head_kv of Llama3, why it will influence the success of model building
add support of AMP
rethink the variable self.change_file_event and try using it to resolve list index out of range

modify the file_segments_dataloader with a start switch, not an automatical start after initializing
add cpu support for debugging purpose
add token calculation for statistic
调研其他小的LLM的tokens数量
搞定deepseek和ddp的接口整合

Documents

utils.data.dataloader

DataLoader

If the method __len__ is not implemented, the step will be set to a number.

现在的问题是总会丢掉最后一个样本，且不好修改这部分的逻辑。

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
.idea		.idea
__pycache__		__pycache__
configs		configs
datasets		datasets
experiments		experiments
metrics		metrics
models		models
nn		nn
test		test
tokenizer		tokenizer
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
dev.log.md		dev.log.md
log.txt		log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dlx

highlights

test projects

todo

Documents

utils.data.dataloader

DataLoader

About

Releases

Packages

Languages

Halle-Astra/dlx

Folders and files

Latest commit

History

Repository files navigation

dlx

highlights

test projects

todo

Documents

utils.data.dataloader

DataLoader

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages