coco-p1 training divergent #13

HaojieYuu · 2022-09-16T07:22:28Z

I tried to reproduce results under coco-p1 configuration, but training divergent after 40k steps and I got only 14% mAP which is far lower than 19.64%. Could you help me, please

ZRandomize · 2022-09-20T03:08:00Z

Training with such small amout of supervision is sensitive to hyper-parameters, please try batch 8 and logits weight 3

ZRandomize · 2022-09-20T03:14:20Z

just corrected the config in latest commit

HaojieYuu · 2022-09-20T03:24:27Z

Thanks for the reply, I will try the latest code

HaojieYuu · 2022-09-23T04:09:02Z

I just tried the latest config, and I add IMS_PER_DEVICE=1 to avoid the below assert.
def adjust_config(cfg): base_world_size = int(cfg.SOLVER.IMS_PER_BATCH / cfg.SOLVER.IMS_PER_DEVICE) # Batchsize, learning rate and max_iter in original config is used for 8 GPUs assert base_world_size == 8, "IMS_PER_BATCH/DEVICE in config file is used for 8 GPUs"

But the training still diverged after about 40k steps. I got higher result 16% mAP, but it's still much lower than 19.64%. I notice that coco-p1 don't use multiple-scale training, will that influence the final result?

ZRandomize · 2022-09-23T05:52:44Z

Indeed... Thanks for correction, I'll fix it. The multi-scale training would affect performance a lot, please use SUPERVISED=(WeakAug,dict(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style="choice")), to align with previous works like Unbiased Teacher;
or SUPERVISED=(WeakAug,dict(short_edge_length=(640, 800), max_size=1333, sample_style="range")), for higher performance

HaojieYuu · 2022-10-12T07:19:52Z

Thanks for your reply. I get 18.49 mAP now, but it's still 1 point lower than the score presented in paper(19.64±0.34). Can this fluctuation in the result be considered normal?
Besides, I noticed that both the student model and the teacher model are evaluated twice, but I can't find where the problem is. This problem can be reproduced with the latest code, could you please help?

ZRandomize · 2022-10-12T07:28:38Z

seems its close, our curve looks like this:

We make the model evaluate both teacher and student every 2k iter, and we report the performance of the teacher

HaojieYuu · 2022-10-12T07:48:34Z

Thanks for your detailed reply! In my situation, the inference is carried out 4 times every 2k iter, not 2 times. Both the teacher and student models are evaluated twice which is bizarre. I didn't modify the code, could you reproduce this problem with the official code?

ZRandomize added the bug Something isn't working label Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coco-p1 training divergent #13

coco-p1 training divergent #13

HaojieYuu commented Sep 16, 2022

ZRandomize commented Sep 20, 2022

ZRandomize commented Sep 20, 2022

HaojieYuu commented Sep 20, 2022

HaojieYuu commented Sep 23, 2022

ZRandomize commented Sep 23, 2022 •

edited

Loading

HaojieYuu commented Oct 12, 2022

ZRandomize commented Oct 12, 2022 •

edited

Loading

HaojieYuu commented Oct 12, 2022

coco-p1 training divergent #13

coco-p1 training divergent #13

Comments

HaojieYuu commented Sep 16, 2022

ZRandomize commented Sep 20, 2022

ZRandomize commented Sep 20, 2022

HaojieYuu commented Sep 20, 2022

HaojieYuu commented Sep 23, 2022

ZRandomize commented Sep 23, 2022 • edited Loading

HaojieYuu commented Oct 12, 2022

ZRandomize commented Oct 12, 2022 • edited Loading

HaojieYuu commented Oct 12, 2022

ZRandomize commented Sep 23, 2022 •

edited

Loading

ZRandomize commented Oct 12, 2022 •

edited

Loading