NVIDIA Hopper architecture

The engine for the world's AI infrastructure makes a quantum leap in performance by an order of magnitude.

The accelerated computing platform for next-generation workloads

Learn more about the next breakthrough in accelerated computing with the NVIDIA Hopper™ architecture. Hopper enables the secure scaling of diverse workloads in any data center, from small businesses to exascale HPC (High Performance Computing) and AI with trillions of parameters - enabling innovative geniuses to realize their life's work faster than ever.

Discover the technological breakthroughs

Hopper has over 80 billion transistors and utilizes a state-of-the-art TSMC 4N process. The architecture leverages five breakthrough innovations on the NVIDIA H200 and H100 Tensor Core GPUs, which together enable a 30x speedup over the previous generation in AI inference with NVIDIA's Megatron 530B chatbot, the world's most comprehensive generative language model.

Transformer engine

The NVIDIA Hopper architecture extends Tensor Core technology with the Transformer engine to accelerate the training of AI models. Hopper Tensor Cores are capable of mixed FP8 and FP16 precision to significantly accelerate AI computations for Transformers. Hopper also triples floating point operations per second (FLOPS) for TF32, FP64, FP16 and INT8 precisions over the previous generation. Combined with the Transformer engine and fourth-generation NVIDIA® NVLink®, Hopper Tensor Cores enable massive acceleration of HPC and AI workloads.


Learn more about NVIDIA Tensor Cores

NVLink, NVSwitch and NVLink switch systems

To keep pace with the speed of business and enable acceleration at scale, exascale HPC and AI models with trillions of parameters require seamless high-speed communication between all GPUs in a server cluster.

With the fourth generation of NVLink, the input and output (IO) of multiple GPUs with NVIDIA DGX™ and HGX™ servers can be scaled bi-directionally at 900 gigabytes per second (GB/s) per GPU, which is more than 7 times the bandwidth of PCIe Gen5.

The third-generation NVIDIA NVSwitch™ supports in-network Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ computing, previously only available on Infiniband, providing a 2x increase in total throughput within eight H200 or H100 GPU servers compared to previous A100 Tensor Core GPU systems.

DGX GH200 systems with NVLink switch system support clusters of up to 256 connected H200s and provide a total bandwidth of 57.6 terabytes per second (TB/s).


More information about NVLink and NVSwitch

NVIDIA Confidential Computing

While data is encrypted during storage and transmission over the network, it is unprotected during processing. Confidential Computing closes this gap by protecting data and applications during processing. The NVIDIA Hopper architecture is the world's first accelerated computing platform to support confidential computing.

Strong hardware-based security gives users running applications on-premises, in the cloud or at the edge the assurance that unauthorized parties cannot view or modify application code and data while they are in use. This protects the confidentiality and integrity of data and applications while enabling the unprecedented acceleration of H200 and H100 GPUs for AI training, AI inference and HPC workloads.


Learn more about NVIDIA Confidential Computing

Second generation MIG

A multi-instance GPU (MIG) can be split into several smaller, fully isolated instances with their own memory, cache and compute units. The Hopper architecture further enhances MIG and supports multi-tenant, multi-user configurations in virtualized environments for up to seven GPU instances, with each instance securely isolated at hardware and hypervisor level through confidential computing. Dedicated video decoders for each MIG instance enable high-throughput intelligent video analytics (IVA) on shared infrastructure. With Hopper's concurrent MIG profiling, administrators can monitor properly sized GPU acceleration and optimize resource allocation for users.

Researchers with smaller workloads can use MIG instead of a full CSP instance to securely isolate a portion of a GPU with confidence that their data is protected during storage, transfer and processing.


More information about MIG

DPX instructions

Dynamic programming is an algorithmic technique for solving complex recursive problems by breaking them down into simpler sub-problems. By storing the results of sub-problems, which do not need to be recalculated later, the time and complexity of exponential problem solving is reduced. Dynamic programming is often used in a variety of applications, for example Floyd-Warshall is a route optimization algorithm for planning the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications.

Hopper's DPX instructions enable a 40x speedup of dynamic programming algorithms over traditional dual-socket CPU servers and a 7x speedup over Ampere architecture GPUs. This means that disease diagnosis, route optimization and even graph analysis can be achieved much faster.


Further information on DPX instructions
Preliminary specifications, subject to change
DPX guide: Comparison between HGX H100 with 4 GPUs and IceLake with 32 cores

In-depth insight into the NVIDIA Hopper architecture

Whitepaper lesen