Machine Learning Faculty Publications

Learning Sampling Policy for Faster Derivative Free Optimization

Zhou Zhai, Nanjing University of Information Science & TechnologyFollow
Bin Gu, Mohamed bin Zayed University of Artificial Intelligence & JD Finance America Corporation, USAFollow
Heng Huang, University of Pittsburgh & JD Finance America CorporationFollow

Document Type

Article

Publication Title

arXiv

Abstract

Zeroth-order (ZO, also known as derivative-free) methods, which estimate the gradient only by two function evaluations, have attracted much attention recently because of its broad applications in machine learning community. The two function evaluations are normally generated with random perturbations from standard Gaussian distribution. To speed up ZO methods, many methods, such as variance reduced stochastic ZO gradients and learning an adaptive Gaussian distribution, have recently been proposed to reduce the variances of ZO gradients. However, it is still an open problem whether there is a space to further improve the convergence of ZO methods. To explore this problem, in this paper, we propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. To find the optimal policy, an actor-critic RL algorithm called deep deterministic policy gradient (DDPG) with two neural network function approximators is adopted. The learned sampling policy guides the perturbed points in the parameter space to estimate a more accurate ZO gradient. To the best of our knowledge, our ZO-RL is the first algorithm to learn the sampling policy using reinforcement learning for ZO optimization which is parallel to the existing methods. Especially, our ZO-RL can be combined with existing ZO algorithms that could further accelerate the algorithms. Experimental results for different ZO optimization problems show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios. © 2021, CC BY.

DOI

10.48550/arXiv.2104.04405

Publication Date

4-9-2021

Comments

Preprint: arXiv

Archived with thanks to arXiv
Preprint License: CC by
Uploaded 24 March 2022

Recommended Citation

Z. Zhai, B. Gu, and H. Huang, "Learning sampling policy for faster derivative free optimization," 2021, arXiv:2104.04405v1

Download

Included in

Computer Sciences Commons

COinS

Machine Learning Faculty Publications

Learning Sampling Policy for Faster Derivative Free Optimization

Document Type

Publication Title

Abstract

DOI

Publication Date

Comments

Recommended Citation

Included in

Browse

Contribute

Links

Machine Learning Faculty Publications

Learning Sampling Policy for Faster Derivative Free Optimization

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Comments

Recommended Citation

Included in

Share

Browse

Contribute

Links