fbpx

NVIDIA TensorRT

0
NVIDIA TensorRT – 6x more speed when compared to PyTorch

Overview

NVIDIA® TensorRTTM is a deep learning inference SDK with outstanding performance. It provides a deep learning inference optimizer and runtime for deep learning inference applications with low latency and high throughput.

During inference, TensorRT-based apps are up to 40 times faster than CPU-only systems. You may use TensorRT to improve neural network models trained in all major frameworks, calibrate for reduced precision while maintaining high accuracy, and deploy to hyperscale data centers, embedded systems, or automotive product platforms.

TensorRT is based on CUDA®, NVIDIA’s parallel programming model, and allows you to optimize inference using CUDA-XTM libraries, development tools, and technologies for AI, autonomous machines, high-performance computing, and graphics. TensorRT takes advantage of sparse tensor cores on upcoming NVIDIA Ampere Architecture GPUs, delivering an additional performance increase.

For production deployments of deep learning inference applications such as video streaming, speech recognition, recommendation, fraud detection, text generation, and natural language processing, TensorRT provides INT8 using Quantization Aware Training and Post Training Quantization, as well as FP16 optimizations. Reduced precision inference cuts application latency in half, which is essential for many real-time services, as well as autonomous and embedded applications.

Watch the above lecture to learn more on how to install TensorRT on your machine and get it up and running using docker containers on Ubuntu. The lecture outline is as follows:

00:00 Intro to TensorRT
02:20 Prerequisites
03:20 TensorRT Docker Images
06:27 Jupyter Lab within Docker Containers
07:25 Compile TRT OSS
08:26 HuggingFace GPT-2
13:42 PyTorch on CPU/GPU vs TensorRT on GPU
16:42 Outro

Finally, Subscribe to my channel to support us ! We are very close to 100K subscribers <3 w/ love. The algorithmic chef – Ahmad Bazzi. Click here to subscribe. Check my other articles here.