nvidia tensorrt hyperscale inference platform infographic · 2019-10-27 · the explosion of ai...

THE EXPLOSION OF AI Demand for personalized services has led to a dramatic increase in the complexity, number, and variety of AI-powered applications and products. Applications use AI inference to recognize images, understand speech, or make recommendations. To be useful, AI inference has to be fast, accurate, and easy to deploy. EFFICIENCY RATE OF LEARNING THROUGHPUT PROGRAMMABILTY LOW LATENCY ACCURACY SIZE OF NETWORK </> THE POWER OF NVIDIA TensorRT NVIDIA TensorRT ™ is a high-performance inference platform that includes an optimizer, runtime engines, and inference server to deploy applications in production. TensorRT speeds apps up to 40X over CPU-only systems for video streaming, recommendation, and natural language processing. PRODUCTION-READY DATA CENTER INFERENCE The NVIDIA TensorRT inference server is a containerized microservice that enables applications to use AI models in data center production. It maximizes GPU utilization, supports all popular AI frameworks, and integrates with Kubernetes and Docker. Weight and Activation Precision Calbration Layer and Tensor Fusion Multi-Stream Execution Dynamic Tensor Memory Kernel Auto-Tuning Application Developers Avoid spending time writing inference capabilities from scratch and focus on creating innovative solutions with AI. DevOps Engineers Easily deploy inference services for multiple applications and take advantage of orchestration, load balancing, and autoscaling. Data Scientists and Researchers Focus on designing and training models using any of the top AI frameworks without worrying about inference implementation. Image and Video Classiﬁcation Natural Language Processing Recommendation Engine Model Repository THE BEST AI PLATFORM. www.nvidia.com/data-center-inference © 2018 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of their respective owners. UNDERSTANDING INFERENCE PERFORMANCE With inference, speed is just the beginning of performance. To get a complete picture about inference performance, there are seven factors to consider, ranging from programmability to rate of learning. INSIDE THE NVIDIA TensorRT HYPERSCALE INFERENCE PLATFORM The NVIDIA TensorRT Hyperscale Inference Platform delivers on all fronts. It delivers the best inference performance at scale with the versatility to handle the growing diversity of today's networks. STATE-OF-THE-ART AI GPU Deep Learning with the NVIDIA TensorRT Hyperscale Inference Platform NVIDIA T4 POWERED BY TURING TENSOR CORES Efﬁcient, high-throughput inference depends on a world-class platform. The NVIDIA ® Tesla ® T4 GPU is the world’s most advanced accelerator for all AI inference workloads. Powered by NVIDIA Turing ™ Tensor Cores, T4 provides revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Multi-Precision FP16 | Up to 65 TFLOPS INT8 | Up to 130 TOPS Kubernetes on NVIDIA GPUs GPU or CPU NVIDIA TensorRT Inference Server Postprocessing Plugin Preprocessing Plugin

Upload: others

Post on 30-Mar-2020

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: NVIDIA TensorRT Hyperscale Inference Platform Infographic · 2019-10-27 · THE EXPLOSION OF AI Demand for personalized services has led to a dramatic increase in the complexity,

be useful, AI inference has to be fast, accurate, and easy to deploy.

EFFICIENCY RATE OF LEARNINGTHROUGHPUT

PROGRAMMABILTY LOW LATENCY ACCURACY SIZE OF NETWORK

</>

THE POWER OF NVIDIA TensorRTNVIDIA TensorRT™ is a high-performance inference platform that includes an optimizer,

runtime engines, and inference server to deploy applications in production. TensorRT speeds apps up to 40X over CPU-only systems for video streaming, recommendation,

and natural language processing.

PRODUCTION-READYDATA CENTER INFERENCE

The NVIDIA TensorRT inference server is a containerized microservice that enables applications to use AI models in data center production. It maximizes GPU utilization,

supports all popular AI frameworks, and integrates with Kubernetes and Docker.

speeds apps up to 40X over CPU-only systems for video streaming, recommendation, and natural language processing.

Weight and ActivationPrecision Calbration

Layer and Tensor Fusion

Multi-Stream Execution

Dynamic Tensor Memory

Kernel Auto-Tuning

Application Developers Avoid spending time writing inference capabilities from scratch and focus on creating innovative solutions with AI.

DevOps EngineersEasily deploy inference services for multiple applications and take advantage of orchestration, load balancing, and autoscaling.

Data Scientists and ResearchersFocus on designing and training models using any of the top AI frameworks without worrying about inference implementation.

Image and VideoClassification

Natural LanguageProcessing

RecommendationEngine

ModelRepository

THE BEST AI PLATFORM. www.nvidia.com/data-center-inference

© 2018 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of their respective owners.

INTRODUCING THENVIDIA AI INFERENCE PLATFORM

UNDERSTANDING INFERENCE PERFORMANCE

With inference, speed is just the beginning of performance. To get a complete picture about inference performance, there are seven factors to consider,

ranging from programmability to rate of learning.

INSIDE THE NVIDIA TensorRTHYPERSCALE INFERENCE PLATFORM

The NVIDIA TensorRT Hyperscale Inference Platform delivers on all fronts. It delivers the bestinference performance at scale with the versatility to handle the growing diversity of today's networks.

STATE-OF-THE-ART

AIGPU Deep Learning with the

NVIDIA TensorRT Hyperscale Inference Platform

NVIDIA T4 POWERED BY TURING TENSOR CORES

Efficient, high-throughput inference depends on a world-class platform. The NVIDIA® Tesla® T4 GPU is the world’s most advanced accelerator for all AI inference workloads. Powered by

NVIDIA Turing™ Tensor Cores, T4 provides revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI.

Multi-PrecisionFP16 | Up to 65 TFLOPSINT8 | Up to 130 TOPSTFLOPS = trillion floating-point operations per secondTOPs = trillion operations per second

Kubernetes on NVIDIA GPUs

GPU or CPU

NVIDIA TensorRT Inference Server

PostprocessingPlugin

PreprocessingPlugin

TensorRT Inference Server - Nvidia · OVERVIEW OF THE INFERENCE SERVER The NVIDIA inference server provides a cloud inferencing solution optimized for NVIDIA GPUs. We will refer to

Financial Services eInsights - Hyperscale Computing eBook

THE ASCENT TO HYPERSCALE - Horison...Fred Moore, PresidentHorison.com THE ASCENT TO HYPERSCALE Started by a few internet and cloud providers in the United States, hyperscale data centers

March 20, 2019 TensorRT Inference - Nvidia · TensorRT Inference with TensorFlow Pooya Davoodi (NVIDIA) Chul Gwon (Clarifai) Guangda Lai (Google) Trevor Morris (NVIDIA) March 20,

HYPERLANE SERVICES FOR HYPERSCALE DATA CENTERS