You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the source code, only 'validate.py' uses HarDBlock_v2.
And the model with HarDBlock_v2 is faster than that with HarDBlock.
I want to know the difference between v1 and v2.
The text was updated successfully, but these errors were encountered:
Hi, thank you for the feedback. They are equivalent in math while HarDBlock_v2 reduces the use of concat such that it is a little bit faster. It first decomposes convolutions from "many to one" to "one by one", then merges the convolutions with the same input tensor.
For example:
X = Conv ( Concat( [A, B] ) )
Y = Conv ( Concat( [A, C] ) )
In HarDBlock_v2, it will be like:
Z = Conv ( A ) // the shape of Z is concat([X,Y])
X = Conv ( B )
Y = Conv ( C )
X += Z[ 0 : X.shape(1) ]
Y += Z[ X.shape(1) : ]
The concatenation can be totally reduced while the block level output concat is still required. It can be faster than the original one because the concatenation involves memory copy which is time-consuming. However, this transformation is not free. The "+=" part requires additional memory accesses as well but just minor than the concatenation. Also, the training time for HarDBlock_v2 is slower than the original one. So, we still urge that pytorch and tensorRT to support convolutions for "discontinuous tensor" such that the concat can be just a pointer-wise operation without any memory copy.
In the source code, only 'validate.py' uses HarDBlock_v2.
And the model with HarDBlock_v2 is faster than that with HarDBlock.
I want to know the difference between v1 and v2.
The text was updated successfully, but these errors were encountered: