Last week, Intel released the third-generation Xeon Scalable processor Ice Lake using its 10nm process. On Monday, Mobileye, an Israeli self-driving chip company acquired by Intel, announced a collaboration with Udelv, an autonomous driving (AV) startup company, to develop driverless cargo truck Transporters using its EyeQ 5 chip and full-stack AV system platform.
Also on Monday, Nvidia released Grace, a data center processor based on the Arm architecture, and at the same time released the autonomous driving (AV) platform DRIVE Hyperion 8 and the AV chip DRIVE Atlan (which claims to have a performance of up to 1000TOPS).
In the server CPU market, Intel has more than 90% of the share, and AMD Xiaolong series server CPUs have always been unable to shake Intel’s position. Can Grace, jointly developed by Nvidia and Arm, compete against Intel? Can Nvidia’s highest-performance autonomous driving chips and platforms surpass Intel’s Mobileye?
In the world’s two largest markets with the highest performance computing and the hottest demand-data centers and autonomous driving, NVIDIA has begun to face Intel. Can Nvidia, which has been living in the shadow of Intel for many years, even share with Intel this time and dominate the global high-performance computing (HPC) market? Before answering this question, let’s take a look at the respective “master weapons” of the two companies.
NVIDIA AV platform Hyperion 8 and AV chip DRIVE Atlan
Nvidia CEO Huang Renxun announced DRIVE Atlan at the GTU Virtual Conference. This next-generation DRIVE SoC planned for mass production in 2025 will provide up to 1,000 TOPS of performance, integrates CPU, GPU and deep learning accelerator (DLA), and is the next generation AV provides the latest network and security.
Nvidia’s autonomous driving chip DRIVE Atlan. (Source: Nvidia)
NVIDIA also released the autonomous driving development platform DRIVE Hyperion 8, and the digital twin simulation tool DRIVE Sim. The company claims to have signed a US$8 billion car cooperation development agreement, with partners including Volvo Cars, Mercedes-Benz, NIO, SAIC, TuSimple, Zoox, Cruise, Faraday Future, VinFast and other traditional car factories and new car manufacturers.
The DRIVE Hyperion 8 AV platform can support data acquisition, AV development and testing. (Source: Nvidia)
Among all the technologies and products released by Nvidia, EETimes automotive columnist and senior automotive industry analyst Egil Juliussen believes that Hyperion 8 may be the most valuable. This AV development platform may attract more car OEMs, autonomous driving startups and transportation companies because it can greatly simplify the AV system design process and pave the way for future product development. This is similar to the development system of the IC design industry, or the cloud computing platform of the AI-based autonomous driving industry. In the keynote speech, Huang Renxun did mention the cooperation agreements reached with Amazon AWS and Google Cloud respectively.
Drive Atlan showed Nvidia’s consistent practice of continuously improving its self-driving SoC based on GPUs, but Mike Demler, a senior analyst at Linley Research, noted that Nvidia seems to be announcing its future processor products earlier and earlier. Does this make the industry and the competition more competitive? Does the opponent have no breathing power? Orin has not yet mass-produced, they have now released the next generation product Atlan. Atlan schematics and 1,000 TOPS performance specifications may just be top-level design goals.
Nvidia’s DRIVE SoC planning diagram. (Source: Nvidia)
Mike Demler questioned that Drive Pegasus with 320 TOPS Xavier is an L5 system, and Orin upgraded to 400 TOP, and now Atlan has jumped to 1000 TOPS? The similar AV chip provided by Intel’s Mobileye is about 1 of the “TOPS” performance rating. /10, and the power consumption is much lower. Obviously TOPS is not a reliable measure. He added that winning Mercedes-Benz and Volvo’s design is very good, but it is meaningless until the mass production stage.
Juliussen also agreed that 1,000 TOPS is good, but it is almost impossible to achieve. He believes that TOPS should represent extremely optimistic processor speed (Totally Optimistic Processor Speed), and power consumption is meaningful.
Demler also questioned Atlan’s SoC architecture. Nvidia hopes that Atlan can integrate all car driving functions such as dashboards, infotainment, ADAS/AV, driver monitoring (DMS), and network gateways with a single chip. In Nvidia’s view, a car is a server with wheels. However, unlike data centers, self-driving cars do not have unlimited power supply. It is not clear whether it is the best way to integrate all these functions on a single chip, although Nvidia will launch a series of Atlan chips.
Mobileye adopts a system-level AV strategy
How does Nvidia lead the automotive market? With its complete ecosystem (hardware, software, and AI models), as well as SoC with ever-increasing performance. Mobileye adopts a system-level AV strategy, and its secret weapon is “true redundancy.”
Udelv, an AV start-up company that cooperates with Mobileye, plans to produce 35,000 Transporter unmanned trucks by 2028, all using Mobileye’s full-stack autonomous driving system. According to Udelv co-founder and CEO Daniel Laury, they initially adopted Baidu’s Apollo platform, but finally chose Mobileye, mainly because of the latter’s “redundant autopilot” function, because it is “true redundancy.” Mobileye also adopted a unique method of separating the sensor into two channels-one for the camera and the other for radar and lidar. The idea is to let each channel independently prove its safety, and then merge the two channels. In contrast, the competitor’s approach is to deploy complementary sensors, that is, to fuse them together from the beginning to create a single model.
Udelv’s Transporter driverless cargo truck uses Mobileye’s full-stack autonomous driving system. (Source: Udelv)
Udelv also likes Mobileye’s “Road Experience Management” (REM) crowdsourcing map solution, which can support a wide range of map coverage. Mobileye claims that they can map more than 8 million kilometers of roads every day, and they have mapped nearly 1 billion kilometers of roads. The company predicts that by 2024, it will be able to map 1 million kilometers a day.
Intel Xeon Xeon processors take all of the cloud, network and intelligent edge
According to Intel, compared with the previous generation products, the latest third-generation Xeon Scalable processors have an average performance increase of 46% on mainstream data center workloads, while adding deep learning acceleration technology (DL Boost) for AI acceleration. . Xeon scalable processors using 10nm process can accelerate the deployment of cloud, artificial intelligence, enterprise, high-performance computing, network, security and edge applications.
Navin Shenoy, Intel executive vice president and general manager of the Data Platform Division, released the third-generation Intel Xeon Scalable processor. (Source: Intel)
According to Intel’s performance in the first quarter of 2021, the processor has shipped more than 200,000 units. Among them, large-scale cloud service providers around the world are about to deploy services. Among 50 independent OxM partners, there are more than 250 designs based on the processor, more than 20 high-performance computing (HPC) laboratories and HPC-as-a-service environments The new Xeon Scalable processor is being used.
Nvidia’s data center CPU processor “Grace” based on the Arm architecture
At the GTC2021 conference, NVIDIA released its first data center CPU processor “Grace” based on the Arm architecture, which can achieve 10 times the ultra-high performance of today’s fastest servers under the most complex AI and high-performance computing workloads.
Analysis believes that Nvidia’s move is a direct challenge to Intel’s dominance in the field of server and data center computing, because after the press conference, the stock prices of Intel and AMD fell by several percentage points.
Why do you want to make this CPU?
Nvidia believes that the data volume and scale of AI models are growing exponentially. Today’s largest AI model includes billions of parameters, and it doubles every two and a half months. Training them requires a new CPU, which can be closely integrated with the GPU to eliminate system bottlenecks.
Huang Renxun, the founder and CEO of NVIDIA, said, “NVIDIA Grace? CPU is the result of more than 10,000 engineering years, designed to meet the computing requirements of the world’s most advanced applications.”-These applications include natural language processing, recommendation systems, AI supercomputing-the massive data analysis it performs requires ultra-high-speed computing performance and large-capacity memory.
Lao Huang still wears his trademark leather jacket and holds a press conference in their iconic kitchen. The only change is that the hair grows longer. Some people say “Lao Huang is worried about being out of stock.”
The name Grace comes from the United States Rear Admiral and computer programming pioneer Grace Hopper. She is one of the pioneers of computer science, the first programmers of Harvard Mark 1 and the inventor of the first linker.
Grace Hopper pioneered computer programming in the 1950s and invented the world’s first compiler, known as the “first lady of computer software engineering.”
This CPU product uses the Arm Neoverse core, combined with a low-power memory subsystem, to provide high performance with high energy efficiency. Some people think that this Nvidia is a work that Nvidia expresses sincerity at the critical moment of acquiring Arm.
“The cutting-edge AI and data science are pushing today’s computer architecture to go beyond its limits to process massive amounts of data that is unimaginable. NVIDIA designed Grace with the IP authorized by Arm, a CPU designed for large-scale AI and HPC. Together with GPU and DPU, Grace provides us with the third basic computing technology and the ability to restructure data centers in order to promote AI development. NVIDIA is now a company with three chips.” Huang Renxun said.
How does it compare to x86 CPUs?
Nvidia described in the press release that Grace is a highly specialized processor, and its workloads are oriented to, for example, training a new generation of NLP models with more than 1 trillion parameters. When tightly coupled with NVIDIA GPU, the system with Grace CPU is 10 times faster than today’s most advanced system based on NVIDIA DGX® (running on x86 CPU).
While the vast majority of data centers are served by existing CPUs, Grace will provide services for the computing market segment.
The Swiss National Supercomputer Center (CSCS) and the US Department of Energy’s Los Alamos National Laboratory (Los Alamos National Laboratory) are the first to announce plans to build a supercomputer equipped with Grace to support national scientific research.
NVIDIA launched Grace on the background of exponential growth in the amount of data and the scale of AI models. Today’s largest AI models contain billions of parameters, and the number of parameters doubles every two and a half months. Training these models requires a new CPU tightly coupled with the GPU to eliminate system bottlenecks.
NVIDIA took advantage of the great flexibility of the Arm data center architecture to build Grace. By launching new server-grade CPUs, NVIDIA is advancing the goal of technological diversity in the AI and HPC fields. In these areas, more choices are the key to realizing the innovations needed to solve the world’s most pressing problems.
Arm CEO Simon Segars said: “As the world’s most licensed processor architecture, Arm is driving innovation in incredible new ways every day. NVIDIA’s launch of the Grace data center CPU clearly demonstrates how Arm’s licensing model promotes an important innovation. This will further support the extraordinary work of AI researchers and scientists around the world.”
Grace’s first users
CSCS and Los Alamos National Laboratory plan to launch the Grace-equipped Alps system built by Hewlett-Packard Enterprise in 2023. The system uses the new HPE Cray EX supercomputer product line and NVIDIA HGX supercomputing platform. In addition to the new Grace CPU, it also includes NVIDIA GPU and NVIDIA HPC SDK.
CSCS Director Professor Thomas Schulthess said: “Using NVIDIA’s new Grace CPU allows us to integrate AI technology and traditional supercomputing to solve some of the most difficult problems in the field of computing science. We are very happy to offer our Swiss and Users around the world provide this new NVIDIA CPU for processing and analyzing massive and complex scientific data sets.”
Thom Mason, Director of Los Alamos National Laboratory, said: “By innovatively balancing memory bandwidth and capacity, a new generation of systems will reshape our organization’s computing strategy. With NVIDIA’s new Grace CPU, we can use larger data sets than before. Complete high-fidelity 3D simulation and analysis on the site, so as to carry out advanced scientific research work.”
The Alps system is a member of the new generation of supercomputers and will replace the existing Piz Daint supercomputer of CSCS. The new generation of supercomputers uses GPU-accelerated deep learning technology to extend supercomputing beyond the traditional modeling and simulation fields.
Thomas Schulthess said: “Deep learning is just a set of very powerful tools that we added to the toolbox.”
Alps utilizes the tight coupling between NVIDIA CPU and GPU, and it is estimated that it will only take two days to train the world’s largest natural language processing model GPT-3, which is 7 times faster than NVIDIA’s 2.8-AI exaflops Selene supercomputer. The Selene supercomputer is currently recognized by MLPerf as the world’s leading AI supercomputer.
CSCS users can widely apply this super AI performance to emerging scientific research that benefits from natural language understanding. For example, this includes analyzing and understanding the vast amounts of knowledge provided in scientific papers, as well as generating new molecules for drug discovery.
Achieve performance breakthroughs
According to NVIDIA data, Grace is based on the fourth-generation NVIDIA NVLink® interconnect technology, which provides a record-breaking 900 GB/s connection speed between Grace and NVIDIA GPU, making the total bandwidth 30 times higher than today’s leading servers. The speed from CPU to CPU exceeds 600GB/s.
Grace will also utilize the innovative LPDDR5x memory subsystem, which has twice the bandwidth of DDR4 memory and 10 times the energy efficiency of DDR4. In addition, the new architecture provides cache coherency in a single memory address space, combining the system and HBM GPU memory to simplify programmability.
Grace will be supported by the NVIDIA HPC software development kit and a full set of CUDA? and CUDA-X? libraries, which can accelerate more than 2,000 GPU applications, making it faster for scientists and researchers to tackle major global challenges.