This repository was archived by the owner on Jul 10, 2021. It is now read-only.
This repository was archived by the owner on Jul 10, 2021. It is now read-only.
Per-layer learning rates and weight decays #100
Open
Description
With the advent of hyperparameter tuners like Spearmint and SigOpt, and with recent results such as http://arxiv.org/abs/1502.03492, it seems useful to expose per-layer learning rates. Is this a possibility?