Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/119293
| Title: | DirMoE : Dirichlet-Routed Mixture of Experts | Authors: | Vahidi, A Moullet, M Asadollahzadeh, H Ly, K Yang, X Attar, NA Lotfollahi, M |
Issue Date: | 2026 | Source: | The Fourteenth International Conference on Learning Representations, ICLR 2026, Rio de Janeiro, Brazil, Apr 23-27 2026, https://openreview.net/forum?id=a15cDnzr6r | Abstract: | Mixture-of-Experts (MoE) models have demonstrated exceptional performance in large-scale language models. Existing routers typically rely on non-differentiable Top-k+Softmax, limiting their performance and scalability. We argue that two distinct decisions, which experts to activate and how to distribute expert contributions among them, are conflated in standard Top-k+Softmax. We introduce Dirichlet-Routed MoE (DirMoE), a novel end-to-end differentiable routing mechanism built on a Dirichlet variational autoencoder framework. This design fundamentally disentangles the core routing problems: expert selection, modeled by a Bernoulli component, and expert contribution among chosen experts, handled by a Dirichlet component. The entire forward pass remains fully differentiable through the use of Gumbel-Sigmoid relaxation for the expert selection and implicit reparameterization for the Dirichlet distribution. Our training objective, a variational ELBO, includes a direct sparsity penalty that precisely controls the number of active experts in expectation, alongside a schedule for key hyperparameters that guides the model from an exploratory to a definitive routing state. Moreover, our DirMoE router matches or exceeds other methods while improving expert specialization. | Publisher: | OpenReview.net | Description: | The Fourteenth International Conference on Learning Representations, ICLR 2026, Rio de Janeiro, Brazil, Apr 23-27 2026 | Rights: | CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) The following publication Vahidi, A., Asadollahzadeh, H., Attar, N. A., Moullet, M., Ly, K., Yang, X., & Lotfollahi, M. (2026). DirMoE: Dirichlet-routed Mixture of Experts. In The Fourteenth International Conference on Learning Representations is available at https://openreview.net/forum?id=a15cDnzr6r. |
| Appears in Collections: | Conference Paper |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Vahidi_DirMoE_Dirichlet_Routed.pdf | 2.21 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


