How to check overfitting #5061

aseprohman · 2021-10-06T02:32:16Z

❔Question

Hello @glenn-jocher,
which graph box shows that the training process that I am doing is overfitting or not ( val/box_loss, val/obj_loss, etc ) or is there another method to check ? can you give me a little explanation about the difference between val/box_loss and val/obj_loss
many thanks

Additional contex

t

glenn-jocher · 2021-10-06T02:40:39Z

@aseprohman validation losses should overfit or you haven't trained long enough.

For loss descriptions see original YOLO publications:
https://pjreddie.com/publications/

aseprohman · 2021-10-06T04:42:09Z

which one chart should check to determine overfitting ? box_loss or obj_loss ?

glenn-jocher · 2021-10-06T05:01:04Z

@aseprohman overfitting can occur in any val loss component.

aseprohman · 2021-10-06T05:17:09Z

no priority chart for validate overfit ? if one of either reach overfit it means training shoud be stop ?

suryasid09 · 2023-05-02T13:13:06Z

Hello @aseprohman, did you understand how to tell the model is overfitting? Is it both val plot need to see, or if one val loss is increasing with respect to training loss, we can say it is overfitting? Please let me know.

glenn-jocher · 2023-05-02T14:15:03Z

Hi @suryasid09, to determine if the model is overfitting, it's best to check all the validation loss plots including val loss, val/box_loss, and val/obj_loss. If any of these validation losses start to increase while the training loss is still decreasing, then it could be an indication that the model is starting to overfit.

However, it's also possible that some validation losses may reach their minimum value and stay there while the training continues to improve, which is known as convergence. In this case, it's not a sign of overfitting, and you should continue training the model until you achieve satisfactory results.

Overall, you should use your best judgement to decide whether the model is overfitting or not based on all the validation loss plots in conjunction with your specific use case and performance goals.

wtjasmine · 2023-07-26T09:52:30Z

Hi, if I have already followed all these recommendations on my dataset:

Images per class. ≥ 1500 images per class recommended
Instances per class. ≥ 10000 instances (labeled objects) per class recommended
Image variety. Must be representative of deployed environment. For real-world use cases we recommend images from different times of day, different seasons, different weather, different lighting, different angles, different sources (scraped online, collected locally, different cameras) etc.
Label consistency. All instances of all classes in all images must be labelled. Partial labelling will not work.
Label accuracy. Labels must closely enclose each object. No space should exist between an object and it's bounding box. No objects should be missing a label.
Label verification. View train_batch*.jpg on train start to verify your labels appear correct, i.e. see example mosaic.
Background images. Background images are images with no objects that are added to a dataset to reduce False Positives (FP). We recommend about 0-10% background images to help reduce FPs (COCO has 1000 background images for reference, 1% of the total). No labels are required for background images.

I have tried to train with 300 epochs without changing any default settings of the model, the training stops halfway after 100 epochs the metrics do not show improvement. Although the losses of both validation and training still reducing during training, metrics like precision, recall, and mAP seems to stick around 0.53, does this means the training has reached the final results and training should stop? Because I am trying to detect small objects from images, could you pls suggest some tips to improve the results? Thank you.

glenn-jocher · 2023-07-26T12:35:25Z

@wtjasmine hi,

Based on what you've described, the training seems to have reached a plateau in terms of metrics such as precision, recall, and mAP. This could indicate that the model has converged and further training may not result in significant improvement.

If you are specifically trying to detect small objects, there are a few things you can consider to potentially improve the results:

Data augmentation: Apply augmentation techniques such as random scaling, flipping, rotation, and color jittering to increase the diversity of the training data and improve the model's ability to generalize.
Model architecture: Explore different model architectures that are designed to handle small objects, such as YOLOv4-416 or YOLOv5x. These models have more parameters and may be better suited for detecting small objects.
Model initialization: Check if the model is properly initialized with pre-trained weights. You can start training from pre-trained weights on a similar dataset to help the model converge faster.
Hyperparameter tuning: Experiment with adjusting hyperparameters such as learning rate, batch size, and optimizer to find the optimal configuration for your specific task and dataset.
Data filtering: If your dataset contains a significant number of false positives or irrelevant images, consider filtering out these instances to improve the model's performance on small objects.

Remember to test these modifications incrementally and monitor the changes in metrics to determine their effectiveness.

Hope these suggestions help! Let me know if you have any further questions.

Asshok1 · 2024-04-15T11:32:12Z

hi @glenn-jocher i have trained my yolo model for 100 epochs( initially i have given 100 epochs ) and the model saves the best.pt file as i have seen that not that much mAp increases so i decided to run to 100 epochs further instead of starting from 0 , now i need to continue from 100 epoch how can i do this ?

glenn-jocher · 2024-04-15T15:08:38Z

@Asshok1 hi there! 😊 To continue training from your 100th epoch, simply use the --resume flag with your training command. If you've kept the default runs/train directory without starting new training sessions, the command would look like this:

python train.py --resume

This will automatically pick up where you left off. If you've moved your training checkpoint or have multiple sessions, you may need to point to the specific last.pt file using the --weights flag like so:

python train.py --weights path/to/your/last.pt

Happy training! Let me know if you need more help.

Aliy012 · 2025-02-28T14:47:47Z

is it normal if my precision value and recall value below 70% ? around 50% to 60% to be exact for my project. i have 5 classes and it about defect detection on solar panel

pderrenger · 2025-03-01T02:52:25Z

@Aliy012 hi there! Precision and recall values in the 50-60% range can occur depending on your dataset complexity and annotation quality. For solar panel defect detection (which often involves small objects), we recommend:

Verify labels using train_batch*.jpg mosaics as shown in the Training Tips guide
Try increasing input resolution (--imgsz 1280) to better capture small defects
Consider using larger models (YOLOv5x) if you're currently using smaller variants
Check for class imbalance in your dataset

If you've already followed our Dataset Guidelines, you might need to expand your training data or try hyperparameter evolution (hyp.scratch.yaml). Small objects often benefit from these adjustments.

Aliy012 · 2025-03-01T19:07:02Z

i was consider to use larger models but i need to deploy the model and conduct real time testing. However, I'm a bit confuse on which graph should i consider to check whether it is overfitting or underfitting ? cls_loss, box_loss or dfl_loss ?. and lastly, how to check for IoU and between recall and precision which one should be higher ?

pderrenger · 2025-03-01T23:38:38Z

Hi there! For overfitting checks, monitor all validation losses (val/cls_loss, val/box_loss, val/dfl_loss). If validation losses rise while training losses fall, it suggests overfitting. For underfitting, check if both training and validation losses plateau at high values.

For IoU, review the metrics/mAP_0.5 and mAP_0.5:0.95 curves in your results.png. Precision/recall balance depends on your use case: prioritize higher recall if missing defects is costly, or precision if false positives are problematic.

For deployment, consider model size/speed tradeoffs: larger models (YOLOv5x) are more accurate but slower. If real-time speed is critical, test smaller variants (YOLOv5s) with --imgsz optimized for your hardware. See the Ultralytics YOLOv5 Training Guide for detailed metric explanations.

Aliy012 · 2025-03-02T08:13:27Z

. from here i consider my model is overfitting and iou is less 0.5 which is this model is not precise and not good. is it true ?

pderrenger · 2025-03-02T19:34:20Z

Hi there! Based on your results, if validation losses (val/box_loss, val/cls_loss) are rising while training losses decrease, this suggests overfitting. For IoU <0.5, which aligns with the Intersection over Union (IoU) explanation, it indicates localization inaccuracies. To improve:

Verify labels using train_batch*.jpg mosaics
Increase image size (--imgsz) for small objects
Try hyperparameter tuning or larger models (YOLOv5x).
See our Training Guide for details.

Aliy012 · 2025-03-03T17:19:22Z

wdym by verify labels using train_batch.jpg ? currently im using google colab and idk what else can i do to improve my results

pderrenger · 2025-03-04T02:27:43Z

Hi there! 😊 To verify labels using train_batch*.jpg, check the mosaic images generated during training in your runs/train/exp directory. These show your training batches with labels overlaid – look for misaligned boxes or missing annotations. For Google Colab, you can view them directly in the file explorer or use from PIL import Image; Image.open('path/to/train_batch0.jpg').show().

To improve results further:

Try larger image sizes (--imgsz 1280) for small defects
Adjust augmentation in hyp.scratch.yaml (increase mosaic or mixup)
Use --evolve for hyperparameter optimization

See the YOLOv5 Training Tips for label verification details. Let us know how it goes!

Aliy012 · 2025-03-04T03:24:18Z

thank you for helping me alottt. but where can i get hyp.scratch.yaml file in google colab ? currently i increased the imgsz but it siad it out of memory.

pderrenger · 2025-03-04T13:08:13Z

Hi there! The hyp.scratch.yaml file is included in the YOLOv5 repository under the data directory. In Google Colab, you can access it at /content/yolov5/data/hyp.scratch.yaml after cloning the repo. For memory issues when increasing --imgsz, try reducing batch size (--batch 16 or lower) or using smaller models like YOLOv5s. More details in our hyperparameter documentation.

Aliy012 · 2025-03-04T15:39:52Z

i see, i pray you god bless you always for not giving up in helping me!! i will try your suggestion and hope it is working :'(

pderrenger · 2025-03-04T19:39:52Z

You're very welcome! We're glad to help. For continued guidance, refer to the YOLOv5 Training Guide which covers best practices for improving performance. The community is here to support your project - feel free to share updates on your progress! 🚀

Aliy012 · 2025-03-05T03:26:27Z

is it necessary to use coco? because currently i download dataset with format i desired for example YOLOv9 format

Aliy012 · 2025-03-05T06:28:54Z

Hi Pderrenger! , is there any other way to make my results for train_loss graph to converge ?

pderrenger · 2025-03-05T07:40:04Z

@Aliy012 hi! For better training loss convergence, first verify your dataset quality using train_batch*.jpg mosaics to check label correctness. If labels are accurate, try adjusting hyperparameters in hyp.scratch.yaml (e.g., learning rate, augmentation settings) or use --evolve for automated optimization. For small objects, increasing --imgsz often helps, but reduce --batch-size if you encounter memory limits. More tips are in our Training Tips Guide. 🚀

aseprohman added the question Further information is requested label Oct 6, 2021

aseprohman closed this as completed Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to check overfitting #5061

How to check overfitting #5061

aseprohman commented Oct 6, 2021 •

edited

Loading

glenn-jocher commented Oct 6, 2021

aseprohman commented Oct 6, 2021

glenn-jocher commented Oct 6, 2021

aseprohman commented Oct 6, 2021 •

edited

Loading

suryasid09 commented May 2, 2023

glenn-jocher commented May 2, 2023

wtjasmine commented Jul 26, 2023

glenn-jocher commented Jul 26, 2023

Asshok1 commented Apr 15, 2024

glenn-jocher commented Apr 15, 2024

Aliy012 commented Feb 28, 2025

pderrenger commented Mar 1, 2025

Aliy012 commented Mar 1, 2025

pderrenger commented Mar 1, 2025

Aliy012 commented Mar 2, 2025

pderrenger commented Mar 2, 2025

Aliy012 commented Mar 3, 2025 •

edited

Loading

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 4, 2025

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 4, 2025

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 5, 2025

Aliy012 commented Mar 5, 2025

pderrenger commented Mar 5, 2025

How to check overfitting #5061

How to check overfitting #5061

Comments

aseprohman commented Oct 6, 2021 • edited Loading

❔Question

Additional contex

glenn-jocher commented Oct 6, 2021

aseprohman commented Oct 6, 2021

glenn-jocher commented Oct 6, 2021

aseprohman commented Oct 6, 2021 • edited Loading

suryasid09 commented May 2, 2023

glenn-jocher commented May 2, 2023

wtjasmine commented Jul 26, 2023

glenn-jocher commented Jul 26, 2023

Asshok1 commented Apr 15, 2024

glenn-jocher commented Apr 15, 2024

Aliy012 commented Feb 28, 2025

pderrenger commented Mar 1, 2025

Aliy012 commented Mar 1, 2025

pderrenger commented Mar 1, 2025

Aliy012 commented Mar 2, 2025

pderrenger commented Mar 2, 2025

Aliy012 commented Mar 3, 2025 • edited Loading

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 4, 2025

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 4, 2025

pderrenger commented Mar 4, 2025

Aliy012 commented Mar 5, 2025

Aliy012 commented Mar 5, 2025

pderrenger commented Mar 5, 2025

aseprohman commented Oct 6, 2021 •

edited

Loading

aseprohman commented Oct 6, 2021 •

edited

Loading

Aliy012 commented Mar 3, 2025 •

edited

Loading