context + claim

LLM inference is getting complex. MoE models, disaggregated architectures (prefill/decode separation), expert parallelism — existing simulators assume dense co-located models and can’t handle these new paradigms.

Frontier is a new simulator built from scratch for this landscape.

constraint map

Problem:

  • Existing simulators designed for dense co-located models
  • Can’t model MoE expert routing dynamics
  • Can’t model disaggregated attention/FFN separation
  • Can’t model cross-cluster communication

Frontier’s approach:

  • Unified framework for co-located AND disaggregated systems
  • Native MoE support with expert parallelism (EP)
  • Simulates cross-cluster expert routing
  • Advanced pipelining strategies for latency hiding
  • Refined operator models for accuracy

Key capabilities:

  • Model prefill/decode (PD) separation
  • Model attention/FFN (AF) disaggregation
  • Expert routing in MoE across clusters
  • Heterogeneous scaling simulation

my take

This is systems infrastructure work that enables future research. Good simulators let you explore design spaces cheaply before committing to implementation. Frontier fills a real gap as inference systems diverge from the “one big GPU” model.

Authors from the Hong Kong University of Science and Technology and NVIDIA — strong systems pedigree.

linkage