Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119293
PIRA download icon_1.1View/Download Full Text
Title: DirMoE : Dirichlet-Routed Mixture of Experts
Authors: Vahidi, A
Moullet, M
Asadollahzadeh, H
Ly, K
Yang, X 
Attar, NA
Lotfollahi, M
Issue Date: 2026
Source: The Fourteenth International Conference on Learning Representations, ICLR 2026, Rio de Janeiro, Brazil, Apr 23-27 2026, https://openreview.net/forum?id=a15cDnzr6r
Abstract: Mixture-of-Experts (MoE) models have demonstrated exceptional performance in large-scale language models. Existing routers typically rely on non-differentiable Top-k+Softmax, limiting their performance and scalability. We argue that two distinct decisions, which experts to activate and how to distribute expert contributions among them, are conflated in standard Top-k+Softmax. We introduce Dirichlet-Routed MoE (DirMoE), a novel end-to-end differentiable routing mechanism built on a Dirichlet variational autoencoder framework. This design fundamentally disentangles the core routing problems: expert selection, modeled by a Bernoulli component, and expert contribution among chosen experts, handled by a Dirichlet component. The entire forward pass remains fully differentiable through the use of Gumbel-Sigmoid relaxation for the expert selection and implicit reparameterization for the Dirichlet distribution. Our training objective, a variational ELBO, includes a direct sparsity penalty that precisely controls the number of active experts in expectation, alongside a schedule for key hyperparameters that guides the model from an exploratory to a definitive routing state. Moreover, our DirMoE router matches or exceeds other methods while improving expert specialization.
Publisher: OpenReview.net
Description: The Fourteenth International Conference on Learning Representations, ICLR 2026, Rio de Janeiro, Brazil, Apr 23-27 2026
Rights: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
The following publication Vahidi, A., Asadollahzadeh, H., Attar, N. A., Moullet, M., Ly, K., Yang, X., & Lotfollahi, M. (2026). DirMoE: Dirichlet-routed Mixture of Experts. In The Fourteenth International Conference on Learning Representations is available at https://openreview.net/forum?id=a15cDnzr6r.
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Vahidi_DirMoE_Dirichlet_Routed.pdf2.21 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.