Machine Learning Faculty Publications

Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization

Zhou Zhai, Nanjing University of Information Science & Technology
Wanli Shi, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Heng Huang, University of Maryland, College Park
Yi Chang, Jilin University
Bin Gu, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of Machine Learning Research

Abstract

Zeroth-order (ZO) methods, which use the finite difference of two function evaluations (also called ZO gradient) to approximate first-order gradient, have attracted much attention recently in machine learning because of their broad applications. The accuracy of the ZO gradient highly depends on how many finite differences are averaged, which are intrinsically determined by the number of perturbations randomly drawn from a distribution. Existing ZO methods try to learn a data-driven distribution for sampling the perturbations to improve the efficiency of ZO optimization (ZOO) algorithms. In this paper, we explore a new and parallel direction, i.e., learn an optimal sampling policy instead of using a random strategy to generate perturbations based on the techniques of reinforcement learning (RL), which makes it possible to approximate the gradient with only two function evaluations. Specifically, we first formulate the problem of learning a sampling policy as a Markov decision process. Then, we propose our ZO-RL algorithm, i.e., using deep deterministic policy gradient, an actor-critic RL algorithm to learn a sampling policy that can guide the generation of perturbed vectors in getting ZO gradients as accurately as possible. Importantly, the existing ZOO algorithms for learning a distribution can be plugged in to improve the exploration of ZO-RL. Experimental results with different ZO estimators show that our ZO-RL algorithm can effectively reduce the query complexity of ZOO algorithms and converge faster than existing ZOO algorithms, especially in the later stage of the optimization process.

First Page

1162

Last Page

1170

Publication Date

1-1-2024

Recommended Citation

Z. Zhai et al., "Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization," Proceedings of Machine Learning Research, vol. 238, pp. 1162 - 1170, Jan 2024.

This document is currently not available here.

COinS

Machine Learning Faculty Publications

Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

Learning Sampling Policy to Achieve Fewer Queries for Zeroth-Order Optimization

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Recommended Citation

Share

Browse

Contribute

Links