‘Noise Stability Optimization for Flat Minima With Optimal Convergence Rates’

“We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function f:ℝd→ℝ and a d-dimensional distribution P which is symmetric at zero, we perturb the weight of f and define F(W)=𝔼[f(W+U)], where U is a random sample from P. This injection induces regularization through the Hessian trace of f for small, isotropic Gaussian perturbations. … Still, convergence rates are not known for finding minima under the average perturbations of the function F. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of P to reduce variance.”

Find the paper and the full list of authors at ArXiv.

View on Site: ‘Noise Stability Optimization for Flat Minima With Optimal Convergence Rates’