So, what exactly is a MoE? In the context of transformer models, a MoE consists of two main elements:

  • Sparse MoE layers
  • A gate network or router

Source: Mixture of Experts Explained