Model size (parameters, billions)
Tokens per second
Number of accelerators
Accelerator type NVIDIA A100 NVIDIA H100 NVIDIA B200 TPU v5p TPU v5e TPU v6e (Trillium) TPU v7 (Ironwood)
Data type FP8 FP16/BF16 FP32
Workload Inference Training