-
-
Notifications
You must be signed in to change notification settings - Fork 16.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to check overfitting #5061
Comments
@aseprohman validation losses should overfit or you haven't trained long enough. For loss descriptions see original YOLO publications: |
which one chart should check to determine overfitting ? box_loss or obj_loss ? |
@aseprohman overfitting can occur in any val loss component. |
no priority chart for validate overfit ? if one of either reach overfit it means training shoud be stop ? |
Hello @aseprohman, did you understand how to tell the model is overfitting? Is it both val plot need to see, or if one val loss is increasing with respect to training loss, we can say it is overfitting? Please let me know. |
Hi @suryasid09, to determine if the model is overfitting, it's best to check all the validation loss plots including However, it's also possible that some validation losses may reach their minimum value and stay there while the training continues to improve, which is known as convergence. In this case, it's not a sign of overfitting, and you should continue training the model until you achieve satisfactory results. Overall, you should use your best judgement to decide whether the model is overfitting or not based on all the validation loss plots in conjunction with your specific use case and performance goals. |
Hi, if I have already followed all these recommendations on my dataset:
I have tried to train with 300 epochs without changing any default settings of the model, the training stops halfway after 100 epochs the metrics do not show improvement. Although the losses of both validation and training still reducing during training, metrics like precision, recall, and mAP seems to stick around 0.53, does this means the training has reached the final results and training should stop? Because I am trying to detect small objects from images, could you pls suggest some tips to improve the results? Thank you. |
@wtjasmine hi, Based on what you've described, the training seems to have reached a plateau in terms of metrics such as precision, recall, and mAP. This could indicate that the model has converged and further training may not result in significant improvement. If you are specifically trying to detect small objects, there are a few things you can consider to potentially improve the results:
Remember to test these modifications incrementally and monitor the changes in metrics to determine their effectiveness. Hope these suggestions help! Let me know if you have any further questions. |
hi @glenn-jocher i have trained my yolo model for 100 epochs( initially i have given 100 epochs ) and the model saves the best.pt file as i have seen that not that much mAp increases so i decided to run to 100 epochs further instead of starting from 0 , now i need to continue from 100 epoch how can i do this ? |
@Asshok1 hi there! 😊 To continue training from your 100th epoch, simply use the python train.py --resume This will automatically pick up where you left off. If you've moved your training checkpoint or have multiple sessions, you may need to point to the specific python train.py --weights path/to/your/last.pt Happy training! Let me know if you need more help. |
is it normal if my precision value and recall value below 70% ? around 50% to 60% to be exact for my project. i have 5 classes and it about defect detection on solar panel |
@Aliy012 hi there! Precision and recall values in the 50-60% range can occur depending on your dataset complexity and annotation quality. For solar panel defect detection (which often involves small objects), we recommend:
If you've already followed our Dataset Guidelines, you might need to expand your training data or try hyperparameter evolution ( |
i was consider to use larger models but i need to deploy the model and conduct real time testing. However, I'm a bit confuse on which graph should i consider to check whether it is overfitting or underfitting ? cls_loss, box_loss or dfl_loss ?. and lastly, how to check for IoU and between recall and precision which one should be higher ? |
Hi there! For overfitting checks, monitor all validation losses ( For IoU, review the For deployment, consider model size/speed tradeoffs: larger models (YOLOv5x) are more accurate but slower. If real-time speed is critical, test smaller variants (YOLOv5s) with |
Hi there! Based on your results, if validation losses (val/box_loss, val/cls_loss) are rising while training losses decrease, this suggests overfitting. For IoU <0.5, which aligns with the Intersection over Union (IoU) explanation, it indicates localization inaccuracies. To improve:
|
wdym by verify labels using train_batch.jpg ? currently im using google colab and idk what else can i do to improve my results |
Hi there! 😊 To verify labels using To improve results further:
See the YOLOv5 Training Tips for label verification details. Let us know how it goes! |
thank you for helping me alottt. but where can i get hyp.scratch.yaml file in google colab ? currently i increased the imgsz but it siad it out of memory. |
Hi there! The |
i see, i pray you god bless you always for not giving up in helping me!! i will try your suggestion and hope it is working :'( |
You're very welcome! We're glad to help. For continued guidance, refer to the YOLOv5 Training Guide which covers best practices for improving performance. The community is here to support your project - feel free to share updates on your progress! 🚀 |
is it necessary to use coco? because currently i download dataset with format i desired for example YOLOv9 format |
Hi Pderrenger! , is there any other way to make my results for train_loss graph to converge ? |
@Aliy012 hi! For better training loss convergence, first verify your dataset quality using |
❔Question
Hello @glenn-jocher,
which graph box shows that the training process that I am doing is overfitting or not ( val/box_loss, val/obj_loss, etc ) or is there another method to check ? can you give me a little explanation about the difference between val/box_loss and val/obj_loss
many thanks
Additional contex
t
The text was updated successfully, but these errors were encountered: