Unable to reproduce the results #11

sanshuiii · 2023-12-02T11:45:38Z

We tried to reproduce the results on Original Volleyball Dataset, but failed. We use the 2 stage training strategy as mentioned in the readme, and run it twice with the following result:

No.	stage 1 acc	stage 2 acc
paper		93.7%
1	Test Prec@1: 93.044 %	Test Prec@1: 93.119 %
2	Test Prec@1: 93.044 %	Test Prec@1: 92.595 %

I am wondering if we are using the wrong configs? Since at our second trial, the 2nd stage acc even decreases. The configs and scripts used are as follows:

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

stage 1

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

stage 2

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

The text was updated successfully, but these errors were encountered:

JinZhangYu · 2024-02-17T06:38:04Z

I also meet with the same problem. Here are the results I reproduced.

But I at least understand what your problem is. You are using 4 GPUs, but in default .yaml it is 1 GPU.
That means you multiply the batch size by 4 but you do not change the learning rate.

ztstctc · 2024-11-20T14:59:53Z

我们尝试在 Original Volleyball Dataset 上重现结果，但失败了。我们使用自述文件中提到的 2 阶段训练策略，并运行两次，结果如下：

不。第一阶段累积第 2 阶段 ACC
纸 93.7%
1 测试Prec@1：93.044 % 测试 Prec@1： 93.119 %
2 测试Prec@1：93.044 % 测试 Prec@1： 92.595 %
我想知道我们是否使用了错误的配置？因为在我们的第二次试验中，第二阶段的 acc 甚至下降了。使用的配置和脚本如下：

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

第一阶段

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

第 2 阶段

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

Why is it that when I set a gpu 0 to gpu 0, 1, it reports an error with a NAN value? And I have no way to run it on gpu 1, only gpu 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the results #11

Unable to reproduce the results #11

sanshuiii commented Dec 2, 2023

JinZhangYu commented Feb 17, 2024

ztstctc commented Nov 20, 2024

第一阶段

第 2 阶段

Unable to reproduce the results #11

Unable to reproduce the results #11

Comments

sanshuiii commented Dec 2, 2023

stage 1

stage 2

JinZhangYu commented Feb 17, 2024

ztstctc commented Nov 20, 2024

第一阶段

第 2 阶段