LLAMA3_for_coding

Fine tuning LLAMA3 on coding instructional dataset

I added programming language for CodeApaca Dataset: https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k and filtered some un-unused languages.
To add programming language, I used column output of CodeAlpaca Dataset and used model : https://huggingface.co/philomath-1209/programming-language-identification to compute logits. To make the inference stage faster, I used multiprocessing library of Pytorch
After processing, there are around 10k samples

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
chatgpt_code_summary		chatgpt_code_summary
finetuning		finetuning
language_detection		language_detection
README.md		README.md

Provide feedback