flow_spatial and flow_temporal can not be set to zero modules at the same time. I think this is a misleading in the paper, otherwise these the output value of the second flow_temporal conv will cause gradient of the first flow_spatial conv to be constant zero. I think the reinitialization of transformer3d.transformer_blocks[idx].flow_temporal and transformer3d.transformer_blocks[idx].flow_spatial in the code is due to this reason.