The accelerated computing platform for next-generation workloads
The accelerated computing platform for next-generation workloads
Discover the technological breakthroughs
Discover the technological breakthroughs
![](https://www.sysgen.de/media/image/7a/71/7e/hopper-arch-transformer-2c50-dwKT6xq5VRHPkn_640x640.jpg)
Transformer engine
The NVIDIA Hopper architecture extends Tensor Core technology with the Transformer engine to accelerate the training of AI models. Hopper Tensor Cores are capable of mixed FP8 and FP16 precision to significantly accelerate AI computations for Transformers. Hopper also triples floating point operations per second (FLOPS) for TF32, FP64, FP16 and INT8 precisions over the previous generation. Combined with the Transformer engine and fourth-generation NVIDIA® NVLink®, Hopper Tensor Cores enable massive acceleration of HPC and AI workloads.
Learn more about NVIDIA Tensor Cores
NVLink, NVSwitch and NVLink switch systems
To keep pace with the speed of business and enable acceleration at scale, exascale HPC and AI models with trillions of parameters require seamless high-speed communication between all GPUs in a server cluster.
With the fourth generation of NVLink, the input and output (IO) of multiple GPUs with NVIDIA DGX™ and HGX™ servers can be scaled bi-directionally at 900 gigabytes per second (GB/s) per GPU, which is more than 7 times the bandwidth of PCIe Gen5.
The third-generation NVIDIA NVSwitch™ supports in-network Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ computing, previously only available on Infiniband, providing a 2x increase in total throughput within eight H200 or H100 GPU servers compared to previous A100 Tensor Core GPU systems.
DGX GH200 systems with NVLink switch system support clusters of up to 256 connected H200s and provide a total bandwidth of 57.6 terabytes per second (TB/s).
More information about NVLink and NVSwitch
![](https://www.sysgen.de/media/image/5a/7a/df/nvlink-switch-system-2c50-d-1_640x640.jpg)
![](https://www.sysgen.de/media/image/28/14/a6/hopper-arch-confidential-computing-2c50-dd2AoCR7Y0aKMa_640x640.jpg)
NVIDIA Confidential Computing
While data is encrypted during storage and transmission over the network, it is unprotected during processing. Confidential Computing closes this gap by protecting data and applications during processing. The NVIDIA Hopper architecture is the world's first accelerated computing platform to support confidential computing.
Strong hardware-based security gives users running applications on-premises, in the cloud or at the edge the assurance that unauthorized parties cannot view or modify application code and data while they are in use. This protects the confidentiality and integrity of data and applications while enabling the unprecedented acceleration of H200 and H100 GPUs for AI training, AI inference and HPC workloads.
Learn more about NVIDIA Confidential Computing
Second generation MIG
A multi-instance GPU (MIG) can be split into several smaller, fully isolated instances with their own memory, cache and compute units. The Hopper architecture further enhances MIG and supports multi-tenant, multi-user configurations in virtualized environments for up to seven GPU instances, with each instance securely isolated at hardware and hypervisor level through confidential computing. Dedicated video decoders for each MIG instance enable high-throughput intelligent video analytics (IVA) on shared infrastructure. With Hopper's concurrent MIG profiling, administrators can monitor properly sized GPU acceleration and optimize resource allocation for users.
Researchers with smaller workloads can use MIG instead of a full CSP instance to securely isolate a portion of a GPU with confidence that their data is protected during storage, transfer and processing.
More information about MIG
![](https://www.sysgen.de/media/image/4c/bf/01/hopper-mig-2c50-diTnP68XQebfCu_640x640.jpg)
![](https://www.sysgen.de/media/image/2d/c0/23/hopper-arch-dpx-2c50-di3QoqdbG1OuRi_640x640.jpg)
DPX instructions
Dynamic programming is an algorithmic technique for solving complex recursive problems by breaking them down into simpler sub-problems. By storing the results of sub-problems, which do not need to be recalculated later, the time and complexity of exponential problem solving is reduced. Dynamic programming is often used in a variety of applications, for example Floyd-Warshall is a route optimization algorithm for planning the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications.
Hopper's DPX instructions enable a 40x speedup of dynamic programming algorithms over traditional dual-socket CPU servers and a 7x speedup over Ampere architecture GPUs. This means that disease diagnosis, route optimization and even graph analysis can be achieved much faster.
Further information on DPX instructions
DPX guide: Comparison between HGX H100 with 4 GPUs and IceLake with 32 cores