feat: add implementation of Lambda regularization #216

Neonkraft · 2025-02-07T16:23:33Z

Implemented Lambda-DARTS for DARTS, NB201 and TNB101.
Added tests.

src/confopt/profile/profiles.py

abhash-er · 2025-02-13T19:43:06Z

src/confopt/train/configurable_trainer.py

@@ -457,6 +458,9 @@ def _train_epoch(  # noqa: C901
            if isinstance(unwrapped_network, LayerAlignmentScoreSupport):
                unwrapped_network.update_layer_alignment_scores()

+            if isinstance(unwrapped_network, LambdaDARTSSupport):
+                unwrapped_network.add_lambda_regularization(base_inputs, base_targets)


You missed adding criterion here with parameters

Nice catch, thanks!

abhash-er · 2025-02-13T20:03:49Z

src/confopt/train/search_space_handler.py

@@ -34,6 +38,7 @@ def __init__(
        lora_toggler: LoRAToggler | None = None,
        is_arch_attention_enabled: bool = False,
        regularizer: Regularizer | None = None,
+        lambda_regularizer: LambdaReg | None = None,


I noticed that the lambda darts regularization is always turned on. I suppose you have written the function disable_lambda_darts(). You can by deafult disable it. Perhaps when you are accepting a lambda_regularizer, if it is not None, enable it

LambdaReg has enabled=True by default. But LambdaReg is only instantiated when the user sets use_lambda_regularizer to True in the profile. Further, the user can configure the lambda_regularizer and disable it.

I found another bug when looking into this, actually. The code would crash when use_lambda_regularizer=False. I've fixed that.

Yup, that's what I was talking about earlier. I was testing the case when use_lambda_regularizer=False, and found that the regularisation was still occurring.

abhash-er · 2025-02-13T20:19:56Z

I'm happy with the workflow of the lambda darts except one thing. There is one thing missing in our code. While calculating forward and backward grads, they dont use softmax at all (save softmaxed params, and restore them later). But we never do that. They do that to not have a softmax backward in their grads, as far as i understand, right? But we still account that in our code. Do you see the problem?

Apart from it, there are minor changes that are needed to be done.

Neonkraft · 2025-02-17T15:26:26Z

While calculating forward and backward grads, they dont use softmax at all (save softmaxed params, and restore them later). But we never do that. They do that to not have a softmax backward in their grads, as far as i understand, right? But we still account that in our code. Do you see the problem?

When they "save" the arch parameters, they also update their values to have the softmax normalized values:

    def softmax_arch_parameters(self):
        self._save_arch_parameters()
        for p in self._arch_parameters:
            p.data.copy_(F.softmax(p, dim=-1))

Later, in the forward pass, they discriminate between the model step and the architect step as follows:

    def forward(self, input, updateType='alpha', pert=None):
        s0 = s1 = self.stem(input)
        self.weights['normal'] = []
        self.weights['reduce'] = []
        for i, cell in enumerate(self.cells):
            if cell.reduction:
                if updateType == 'weight':
                    weights = self.alphas_reduce.clone() # Don't have to softmax this because it's already been softmaxed
                else:
                    weights = F.softmax(self.alphas_reduce, dim=-1)
            else:
                if updateType == 'weight':
                    weights = self.alphas_normal.clone() # Here too!
                else:
                    weights = F.softmax(self.alphas_normal, dim=-1)
            if self.training:
                weights.retain_grad()
                self.weights['reduce' if cell.reduction else 'normal'].append(weights)
            if pert:
                weights = weights - pert[i]
            s0, s1 = s1, cell(s0, s1, weights, self.drop_path_prob)
        out = self.global_pooling(s1)
        logits = self.classifier(out.view(out.size(0),-1))
        return logits

It's not clear to me why they do it this way. As far as I can see, removing "updateType" and having the following code should be identical:

  for i, cell in enumerate(self.cells):
            if cell.reduction:
                    weights = F.softmax(self.alphas_reduce, dim=-1)
            else:
                    weights = F.softmax(self.alphas_normal, dim=-1)
            if self.training:
                weights.retain_grad()
                self.weights['reduce' if cell.reduction else 'normal'].append(weights)
            if pert:
                weights = weights - pert[i]

This ultimately boils down to the implementation of Equations 12 and 13 in the paper. We need the gradients of the arch weights for each cell to calculate the perturbations. These perturbations will be applied in the additional forward and backward passes, and the result from that will be used to calculate the lambda regularization terms which will be directly applied to the parameters of the model.

Can you think of any reasoning for doing it the way the authors have, as opposed to the way given above?

abhash-er · 2025-02-17T17:00:11Z

Can you think of any reasoning for doing it the way the authors have, as opposed to the way given above?

As I get it, when they save the softmaxed parameters, (at that point) they make sure that the softmax backward operation does not come into the computational graph. Thats the only difference i can point out.

That means, in their implementation, they wanted to only consider till the softmaxed arch weights (sigmoid(arch)), and not want the term (1 - sigmoid(arch)) (when the update type is weight)?

I agree that the formulation looks same for both us and them. But the question remains, how does this one difference really affect the formulation of equation 12 and 13!

Neonkraft · 2025-02-17T17:27:14Z

and not want the term (1 - sigmoid(arch)) (when the update type is weight)?

I'm not sure what you mean by this. Can you explain?

abhash-er · 2025-02-17T17:29:56Z

and not want the term (1 - sigmoid(arch)) (when the update type is weight)?

I'm not sure what you mean by this. Can you explain?

I meant, the gradient term for sigmoid for an input x, d(sigmoid (x)) = sigmoid(x) * (1 - sigmoid(x)) dx. So with their code, they only are only considering the first term..

Neonkraft · 2025-02-17T17:43:22Z

The sigma in the equation denotes the softmax operation, not the sigmoid activation function.

abhash-er · 2025-02-18T13:24:24Z

The sigma in the equation denotes the softmax operation, not the sigmoid activation function.

Yeah, but the expression remains same for softmax as well.

feat: add initial implementation of Lambda regularization

7caaec8

Neonkraft requested a review from abhash-er February 7, 2025 16:23

Neonkraft added 7 commits February 11, 2025 14:14

feat: complete implementation of LambdaDARTS

dd82e64

refactor(LambdaDARTS): minor changes to support all searchspace

ff28828

tests: add tests for LambdaDARTS

c7ca9a1

Merge branch 'main' into lambda-darts

5ff3565

feat: add lambda regularizer to profiles

765d0f4

fix(LambdaDARTS): add missing implementation for nb201 and tnb101

0c8d16f

tests: update Lambda-DARTS tests

a792aa3

Neonkraft changed the title ~~feat: add initial implementation of Lambda regularization~~ feat: add implementation of Lambda regularization Feb 12, 2025

abhash-er reviewed Feb 13, 2025

View reviewed changes

src/confopt/profile/profiles.py Show resolved Hide resolved

abhash-er reviewed Feb 13, 2025

View reviewed changes

Neonkraft added 3 commits February 17, 2025 18:10

fix(LambdaDARTS): fix bugs

80d7d5a

fix(LambdaDARTS): fix crash when LambdaDARTS is disabled

87d7847

fix(LambdaDARTS): fix order of checking conditions

a912132

Neonkraft added 3 commits February 19, 2025 17:40

feat: make a separate Profile for LambdaDARTS

3bfe47c

Merge branch 'main' into lambda-darts

8bfee48

Merge branch 'main' into lambda-darts

c5658c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add implementation of Lambda regularization #216

feat: add implementation of Lambda regularization #216

Uh oh!

Neonkraft commented Feb 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

abhash-er Feb 13, 2025

Uh oh!

Neonkraft Feb 17, 2025

Uh oh!

abhash-er Feb 13, 2025 •

edited

Loading

Uh oh!

Neonkraft Feb 18, 2025

Uh oh!

abhash-er Feb 18, 2025

Uh oh!

abhash-er commented Feb 13, 2025 •

edited

Loading

Uh oh!

Neonkraft commented Feb 17, 2025 •

edited

Loading

Uh oh!

abhash-er commented Feb 17, 2025 •

edited

Loading

Uh oh!

Neonkraft commented Feb 17, 2025

Uh oh!

abhash-er commented Feb 17, 2025 •

edited

Loading

Uh oh!

Neonkraft commented Feb 17, 2025

Uh oh!

abhash-er commented Feb 18, 2025

Uh oh!

Uh oh!

feat: add implementation of Lambda regularization #216

Are you sure you want to change the base?

feat: add implementation of Lambda regularization #216

Uh oh!

Conversation

Neonkraft commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

abhash-er Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

Neonkraft Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

abhash-er Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Neonkraft Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

abhash-er Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

abhash-er commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neonkraft commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhash-er commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neonkraft commented Feb 17, 2025

Uh oh!

abhash-er commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neonkraft commented Feb 17, 2025

Uh oh!

abhash-er commented Feb 18, 2025

Uh oh!

Uh oh!

Neonkraft commented Feb 7, 2025 •

edited

Loading

abhash-er Feb 13, 2025 •

edited

Loading

abhash-er commented Feb 13, 2025 •

edited

Loading

Neonkraft commented Feb 17, 2025 •

edited

Loading

abhash-er commented Feb 17, 2025 •

edited

Loading

abhash-er commented Feb 17, 2025 •

edited

Loading