FasterRCNN bug with grayscale input #10338

adeschemps · 2021-11-03T15:41:20Z

adeschemps
Nov 3, 2021

The following code doesn't fail, even though I believe it should:

from pl_bolts.models.detection import FasterRCNN
import torch 

model = FasterRCNN()
print(model)
image = torch.randn((1,1,256,256))
result = model(image)

as print(model) returns:

self.model: FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=1e-05)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=1e-05)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=1e-05)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=1e-05)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256, eps=1e-05)
          )
        )

showing that the first convolution of the backbone expects an input with 3 channels instead of one. This is confirmed by https://github.com/pytorch/vision/blob/3300692c6e7c2023d2f2356a69ec22ca91e38790/torchvision/models/resnet.py#L323:

self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)

Because of this, I don't understand what the model is actually computing, which is very confusing. Any help would be much appreciated

i008 · 2021-11-04T21:23:54Z

i008
Nov 4, 2021

Seems like it converts it automatically to 3 channels during the normalization (see below). This does not have to be the desired way of doing it, one could also just copy the single channel 3 times. Or change the architecture.
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/transform.py#L127

0 replies

tchaton · 2021-11-05T10:37:56Z

tchaton
Nov 5, 2021
Maintainer

Dear @adeschemps,

I believe you should open this issue on Bolts directly.

Best,
T.C

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FasterRCNN bug with grayscale input #10338

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

FasterRCNN bug with grayscale input #10338

adeschemps Nov 3, 2021

Replies: 2 comments

i008 Nov 4, 2021

tchaton Nov 5, 2021 Maintainer

adeschemps
Nov 3, 2021

i008
Nov 4, 2021

tchaton
Nov 5, 2021
Maintainer