Installation
You can install and use optimum-nvidia
through several methods, depending on your needs. The recommended approach for most users is to use the pre-built Docker container, which comes with all necessary dependencies.
Using the Pre-built Docker Container
The quickest way to get started is by pulling the official Docker container from the Hugging Face Docker Hub. This container includes CUDA, TensorRT-LLM, and all Python dependencies.
docker pull huggingface/optimum-nvidia
To run the container and get an interactive shell with GPU access, use the following command:
docker run -it --gpus all --ipc host huggingface/optimum-nvidia
This container is validated for running models with float32
, float16
, bfloat16
, int8
, and fp8
precision.
Installing with Pip
You can install the library from PyPI using pip
. This method is validated on Ubuntu and requires you to have a compatible CUDA environment and NVIDIA drivers already installed.
First, ensure you have Python 3.10+, pip, and OpenMPI installed:
sudo apt-get update && sudo apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
Then, install optimum-nvidia
, making sure to include NVIDIA's PyPI index to fetch dependencies like tensorrt-llm
:
python3 -m pip install --pre --extra-index-url https://pypi.nvidia.com optimum-nvidia
Dependencies
The core dependencies for the project are defined in pyproject.toml
and include:
accelerate
datasets
huggingface-hub
numpy
onnx
optimum
torch
transformers
tensorrt-llm
pynvml
Building from Source
For advanced users or developers who want to customize the build, you can build the Docker container from source. This process involves building the TensorRT-LLM base image first.
1. Clone the Repository
Clone the optimum-nvidia
repository with its submodules (which includes TensorRT-LLM):
git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia.git
cd optimum-nvidia
2. Build the TensorRT-LLM Base Image
Navigate to the tensorrt-llm
submodule and build its release container. You must specify the CUDA Streaming Multiprocessor (SM) architectures you want to target with the CUDA_ARCHS
variable.
# Navigate to the submodule
cd third-party/tensorrt-llm
# Set your target architecture(s)
# For example, for H100 (sm_90) and L40S (sm_89)
TARGET_SM="90-real;89-real"
# Build the base container
make -C docker release_build CUDA_ARCHS=$TARGET_SM
Here are some common CUDA_ARCHS
values:
- Hopper (H100/H200):
90-real
- Ada Lovelace (L4/L40/RTX 4090):
89-real
- Ampere (A100/A30):
80-real
3. Build the Optimum-NVIDIA Image
Once the base image is built, return to the root of the optimum-nvidia
repository and build the final Docker image:
cd ../.. # Return to optimum-nvidia root
docker build -t my-optimum-nvidia:latest -f docker/Dockerfile .
You can now run your custom-built container.