Welcome to Optimum-NVIDIA
Optimum-NVIDIA is the interface between the Hugging Face ecosystem and NVIDIA GPUs, designed to deliver the best possible inference performance. By leveraging NVIDIA TensorRT-LLM under the hood, it allows developers to run Large Language Models (LLMs) at significantly higher speeds—up to 28x faster than standard frameworks—often by changing just a single line of code.
This library provides a seamless integration for TensorRT-LLM, enabling you to use familiar Hugging Face APIs like from_pretrained()
and pipeline()
to load, convert, and run models optimized for NVIDIA hardware.
Why Optimum-NVIDIA?
- Peak Performance: Unlock the full potential of your NVIDIA hardware, including Tensor Core GPUs with FP8 support on Hopper and Ada Lovelace architectures.
- Ease of Use: Transition from
transformers
tooptimum-nvidia
with minimal code changes. The library handles the complex model conversion and engine building process automatically. - Hugging Face Hub Integration: Fetch and load optimized, pre-built TensorRT-LLM engines directly from the Hugging Face Hub, or let the library build them on the fly from a standard
transformers
model checkpoint. - Advanced Quantization: Easily apply advanced quantization techniques like FP8 and AWQ to reduce memory footprint and further accelerate inference, all while maintaining a simple, developer-friendly API.
Whether you're looking to speed up a local prototype or deploy a high-throughput inference service, Optimum-NVIDIA provides the tools to make it happen efficiently.
Key Features
- High-Level APIs: Use
AutoModelForCausalLM
andpipeline
for a familiar,transformers
-like experience. - Automated Engine Building: The
from_pretrained
method intelligently handles the conversion of Hugging Face models to TensorRT-LLM engines. - FP8 Inference: Built-in support for FP8 quantization on compatible hardware (Hopper, Ada Lovelace) via the
use_fp8=True
flag. - CLI for Export: A powerful command-line interface to export models into standalone TensorRT-LLM engines for deployment.
Ready to get started? Head over to the Installation page.