LLM Performance Tools

Model FLOPs Utilization (MFU) Calculator

Model size (parameters, billions)

Tokens per second

Number of accelerators

Accelerator type

Data type

Workload

ℹ️ Assumptions*

Found an issue? Please report it to finbarrtimbers at google's email service dot com or file an issue on GitHub. Include the inputs you used, expected vs. actual results, and any error messages.

KV Cache & Batch Size Calculator

Model size (parameters, billions)

Number of GPUs

GPU type

Tensor Parallelism (TP) degree

Sequence length

Number of full attention layers

Number of sliding window attention layers

Sliding window size

Number of KV heads

Head dimension

Data type

ℹ️ Assumptions*

LLM Performance Tools

Model FLOPs Utilization (MFU) Calculator

KV Cache & Batch Size Calculator

Model FLOPs Utilization (MFU) Calculator