Installation

You can install and use optimum-nvidia through several methods, depending on your needs. The recommended approach for most users is to use the pre-built Docker container, which comes with all necessary dependencies.

Using the Pre-built Docker Container

The quickest way to get started is by pulling the official Docker container from the Hugging Face Docker Hub. This container includes CUDA, TensorRT-LLM, and all Python dependencies.

docker pull huggingface/optimum-nvidia

To run the container and get an interactive shell with GPU access, use the following command:

docker run -it --gpus all --ipc host huggingface/optimum-nvidia

This container is validated for running models with float32, float16, bfloat16, int8, and fp8 precision.

Installing with Pip

You can install the library from PyPI using pip. This method is validated on Ubuntu and requires you to have a compatible CUDA environment and NVIDIA drivers already installed.

First, ensure you have Python 3.10+, pip, and OpenMPI installed:

sudo apt-get update && sudo apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev

Then, install optimum-nvidia, making sure to include NVIDIA's PyPI index to fetch dependencies like tensorrt-llm:

python3 -m pip install --pre --extra-index-url https://pypi.nvidia.com optimum-nvidia

Dependencies

The core dependencies for the project are defined in pyproject.toml and include:

accelerate
datasets
huggingface-hub
numpy
onnx
optimum
torch
transformers
tensorrt-llm
pynvml

Building from Source

For advanced users or developers who want to customize the build, you can build the Docker container from source. This process involves building the TensorRT-LLM base image first.

1. Clone the Repository

Clone the optimum-nvidia repository with its submodules (which includes TensorRT-LLM):

git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia.git
cd optimum-nvidia

2. Build the TensorRT-LLM Base Image

Navigate to the tensorrt-llm submodule and build its release container. You must specify the CUDA Streaming Multiprocessor (SM) architectures you want to target with the CUDA_ARCHS variable.

# Navigate to the submodule
cd third-party/tensorrt-llm

# Set your target architecture(s)
# For example, for H100 (sm_90) and L40S (sm_89)
TARGET_SM="90-real;89-real"

# Build the base container
make -C docker release_build CUDA_ARCHS=$TARGET_SM

Here are some common CUDA_ARCHS values:

Hopper (H100/H200): 90-real
Ada Lovelace (L4/L40/RTX 4090): 89-real
Ampere (A100/A30): 80-real

3. Build the Optimum-NVIDIA Image

Once the base image is built, return to the root of the optimum-nvidia repository and build the final Docker image:

cd ../..  # Return to optimum-nvidia root
docker build -t my-optimum-nvidia:latest -f docker/Dockerfile .

You can now run your custom-built container.