Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A cleaner pattern to make custom_objects simplier #7

Open
isaacgerg opened this issue Sep 5, 2019 · 4 comments
Open

A cleaner pattern to make custom_objects simplier #7

isaacgerg opened this issue Sep 5, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@isaacgerg
Copy link

class categorical_focal_loss:                             
    '''
    Softmax version of focal loss.

           m
      FL = sum  -alpha * (1 - p_o,c)^gamma * y_o,c * log(p_o,c)
          c=1

      where m = number of classes, c = class and o = observation

    Parameters:
      alpha -- the same as weighing factor in balanced cross entropy
      gamma -- focusing parameter for modulating factor (1-p)

    Default value:
      gamma -- 2.0 as mentioned in the paper
      alpha -- 0.25 as mentioned in the paper

    References:
        Official paper: https://arxiv.org/pdf/1708.02002.pdf
        https://www.tensorflow.org/api_docs/python/tf/keras/backend/categorical_crossentropy

    Usage:
     model.compile(loss=[categorical_focal_loss(alpha=.25, gamma=2)], metrics=["accuracy"], optimizer=adam)
    '''
    def __init__(self, gamma=2., alpha=.25):
        self._gamma = gamma
        self._alpha = alpha
        self.__name__ = 'categorical_focal_loss'
        
    def __int_shape(self, x):
        return tf.keras.backend.int_shape(x) if self.backend == 'tensorflow' else tf.keras.backend.shape(x)    
    
    def  __call__(self, y_true, y_pred):        
        '''
        :param y_true: A tensor of the same shape as `y_pred`
        :param y_pred: A tensor resulting from a softmax
        :return: Output tensor.
        '''

        # Scale predictions so that the class probas of each sample sum to 1
        y_pred /= tf.keras.backend.sum(y_pred, axis=-1, keepdims=True)

        # Clip the prediction value to prevent NaN's and Inf's
        epsilon = tf.keras.backend.epsilon()
        y_pred = tf.keras.backend.clip(y_pred, epsilon, 1. - epsilon)

        # Calculate Cross Entropy
        cross_entropy = -y_true * tf.keras.backend.log(y_pred)

        # Calculate Focal Loss
        loss = self._alpha * tf.keras.backend.pow(1 - y_pred, self._gamma) * cross_entropy

        # Sum the losses in mini_batch
        return tf.keras.backend.sum(loss, axis=1)

With this pattern, I don't need dill when using load_model.

@umbertogriffo umbertogriffo added the enhancement New feature or request label Apr 16, 2020
@umbertogriffo
Copy link
Owner

Hi, @isaacgerg thanks for the suggestion. Could you open a PR?

@isaacgerg
Copy link
Author

Unfortunately, I cannot due to my network settings.

@dmonkoff
Copy link

dmonkoff commented Dec 5, 2020

Hey,
I've seen this implementation in a lot of projects and I feel that it is not right to set alpha parameter that way. Alpha as scalar makes sense in binary example, since it's a weight of the positive samples i.e loss = - gt * alpha * ((1 - pr)^gamma) * log(pr) - (1 - gt) * (1-alpha) * (pr^gamma) * log(1 - pr). Positive samples are weighted by alpha, negative samples are weighted by 1-alpha, all good.
If we look at multiple classes, there is no distinguished 'negative' class, you output a softmax vector of class probabilities. So, if we go by the same logic as in binary case, each class should be weighted separately. If you use one value, you basically just scaling the loss by that factor and that's all, no class weighting is done here.
I might be missing something here, so please correct me if I'm wrong.

Edit:
I've checked the function more thoughtfully and you can use vector set alphas, so that was my misunderstanding. I would still add a better description of alpha in the function, and change default alpha value to something more appropriate, for example alpha=1, so all classes are just weighted equally, no scaling is done. alpha=0.25 doesn't make much sense for multiclass example

@umbertogriffo
Copy link
Owner

umbertogriffo commented Dec 6, 2020

Hey @dmonkoff,
You are right, this is an older implementation. The issue has been already solved here, so as I've already written in the latest version you need to specify α as an array, and the size of the array needs to be consistent with the number of categories, representing the corresponding weight of each category.

I'll take into account your suggestions, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants