Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
Document Type
Conference Proceeding
Publication Title
Advances in Neural Information Processing Systems
Abstract
Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms typically assume that the demonstration data is generated by a single type of expert. However, in practice, demonstrations often comprise a mixture of trajectories collected from various expert agents respecting different constraints, making it challenging to explain expert behaviors with a unified constraint function. To tackle this issue, we propose a Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL) algorithm for simultaneously estimating multiple constraints corresponding to different types of experts. MMICRL constructs a flow-based density estimator that enables unsupervised expert identification from demonstrations, so as to infer the agent-specific constraints. Following these constraints, MMICRL imitates expert policies with a novel multi-modal constrained policy optimization objective that minimizes the agent-conditioned policy entropy and maximizes the unconditioned one. To enhance robustness, we incorporate this objective into the contrastive learning framework. This approach enables imitation policies to capture the diversity of behaviors among expert agents. Extensive experiments in both discrete and continuous environments show that MMICRL outperforms other baselines in terms of constraint recovery and control performance. Our implementation is available at: https://github.com/qiaoguanren/Multi-Modal-Inverse-ConstrainedReinforcement-Learning.
Publication Date
1-1-2023
Recommended Citation
G. Qiao et al., "Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations," Advances in Neural Information Processing Systems, vol. 36, Jan 2023.