Frontier: LLM Inference Simulator

context + claim

LLM inference is getting complex. MoE models, disaggregated architectures (prefill/decode separation), expert parallelism — existing simulators assume dense co-located models and can’t handle these new paradigms.

Frontier is a new simulator built from scratch for this landscape.

constraint map

Problem:

Existing simulators designed for dense co-located models
Can’t model MoE expert routing dynamics
Can’t model disaggregated attention/FFN separation
Can’t model cross-cluster communication

Frontier’s approach:

Unified framework for co-located AND disaggregated systems
Native MoE support with expert parallelism (EP)
Simulates cross-cluster expert routing
Advanced pipelining strategies for latency hiding
Refined operator models for accuracy

Key capabilities:

Model prefill/decode (PD) separation
Model attention/FFN (AF) disaggregation
Expert routing in MoE across clusters
Heterogeneous scaling simulation

my take

This is systems infrastructure work that enables future research. Good simulators let you explore design spaces cheaply before committing to implementation. Frontier fills a real gap as inference systems diverge from the “one big GPU” model.

Authors from the Hong Kong University of Science and Technology and NVIDIA — strong systems pedigree.

linkage

moe-architecture — mixture of experts deep dive
inference-optimization — serving efficiency techniques
distributed-llm — multi-node inference patterns

Keith Kitchen

Explorer

Frontier: LLM Inference Simulator

context + claim

constraint map

my take

linkage

Stacked notes

Graph View

Map

Table of Contents